[ https://issues.apache.org/jira/browse/MAHOUT-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Ingersoll updated MAHOUT-855: ----------------------------------- Attachment: MAHOUT-855.patch Here's a fix, going to commit shortly > LuceneTextValueEncoder doesn't properly set internal buffers, causing > BufferUnderflowException > ---------------------------------------------------------------------------------------------- > > Key: MAHOUT-855 > URL: https://issues.apache.org/jira/browse/MAHOUT-855 > Project: Mahout > Issue Type: Bug > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 0.6 > > Attachments: MAHOUT-855.patch > > > The LuceneTextValueEncoder throws an BufferUnderflowException when used. See > the code below. The problem appears to be due to the CharBuffer not getting > values, but I'm not sure yet. > {code} > @Test > public void testLucene() throws Exception { > LuceneTextValueEncoder enc = new LuceneTextValueEncoder("text"); > enc.setAnalyzer(new WhitespaceAnalyzer(Version.LUCENE_34)); > Vector v1 = new DenseVector(200); > enc.addToVector("test1 and more", v1); > enc.flush(1, v1); > } > {code} > Here's the exception: > {quote} > java.nio.BufferUnderflowException > at java.nio.HeapCharBuffer.get(HeapCharBuffer.java:127) > at > org.apache.mahout.vectorizer.encoders.LuceneTextValueEncoder$CharSequenceReader.read(LuceneTextValueEncoder.java:87) > at org.apache.lucene.analysis.CharReader.read(CharReader.java:54) > at > org.apache.lucene.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:181) > at > org.apache.lucene.analysis.CharTokenizer.incrementToken(CharTokenizer.java:273) > at > org.apache.mahout.common.lucene.TokenStreamIterator.computeNext(TokenStreamIterator.java:41) > at > org.apache.mahout.common.lucene.TokenStreamIterator.computeNext(TokenStreamIterator.java:30) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136) > at > org.apache.mahout.vectorizer.encoders.TextValueEncoder.addText(TextValueEncoder.java:78) > at > org.apache.mahout.vectorizer.encoders.TextValueEncoder.addText(TextValueEncoder.java:69) > at > org.apache.mahout.vectorizer.encoders.TextValueEncoder.addToVector(TextValueEncoder.java:59) > at > org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder.addToVector(FeatureVectorEncoder.java:86) > at > org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder.addToVector(FeatureVectorEncoder.java:63) > at > org.apache.mahout.vectorizer.encoders.TextValueEncoderTest.testLucene(TextValueEncoderTest.java:75) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at > org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at org.junit.runner.JUnitCore.run(JUnitCore.java:157) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62) > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira