[ 
https://issues.apache.org/jira/browse/MAHOUT-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-855:
-----------------------------------

    Attachment: MAHOUT-855.patch

Here's a fix, going to commit shortly
                
> LuceneTextValueEncoder doesn't properly set internal buffers, causing 
> BufferUnderflowException
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-855
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-855
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: MAHOUT-855.patch
>
>
> The LuceneTextValueEncoder throws an BufferUnderflowException when used.  See 
> the code below.  The problem appears to be due to the CharBuffer not getting 
> values, but I'm not sure yet.
> {code}
> @Test
>   public void testLucene() throws Exception {
>     LuceneTextValueEncoder enc = new LuceneTextValueEncoder("text");
>     enc.setAnalyzer(new WhitespaceAnalyzer(Version.LUCENE_34));
>     Vector v1 = new DenseVector(200);
>     enc.addToVector("test1 and more", v1);
>     enc.flush(1, v1);
> }
> {code}
> Here's the exception:
> {quote}
> java.nio.BufferUnderflowException
>       at java.nio.HeapCharBuffer.get(HeapCharBuffer.java:127)
>       at 
> org.apache.mahout.vectorizer.encoders.LuceneTextValueEncoder$CharSequenceReader.read(LuceneTextValueEncoder.java:87)
>       at org.apache.lucene.analysis.CharReader.read(CharReader.java:54)
>       at 
> org.apache.lucene.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:181)
>       at 
> org.apache.lucene.analysis.CharTokenizer.incrementToken(CharTokenizer.java:273)
>       at 
> org.apache.mahout.common.lucene.TokenStreamIterator.computeNext(TokenStreamIterator.java:41)
>       at 
> org.apache.mahout.common.lucene.TokenStreamIterator.computeNext(TokenStreamIterator.java:30)
>       at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
>       at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
>       at 
> org.apache.mahout.vectorizer.encoders.TextValueEncoder.addText(TextValueEncoder.java:78)
>       at 
> org.apache.mahout.vectorizer.encoders.TextValueEncoder.addText(TextValueEncoder.java:69)
>       at 
> org.apache.mahout.vectorizer.encoders.TextValueEncoder.addToVector(TextValueEncoder.java:59)
>       at 
> org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder.addToVector(FeatureVectorEncoder.java:86)
>       at 
> org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder.addToVector(FeatureVectorEncoder.java:63)
>       at 
> org.apache.mahout.vectorizer.encoders.TextValueEncoderTest.testLucene(TextValueEncoderTest.java:75)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>       at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
>       at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62)
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to