The problem appears to be fixed in
apache-blur-0.2.4-incubating-SNAPSHOT-hadoop1-bin. Yay! It would be nice
to know if there is an easy workaround for 0.2.3. Not a big deal if there
isn't.
However, instead of getting the exception, the command "blur terms lia
technology.url" will print the terms and then just hang.
One possible fix for this is to add an if test at the end of
TermsDataCommand.doitInternal like this
while (true) {
List<String> terms = client.terms(tablename, family, column,
startWith, size);
for (int i = 0; i < terms.size(); i++) {
String term = terms.get(i);
if (term.equals(startWith)) {
continue;
}
if (checkFreq) {
out.println(client.recordFrequency(tablename, family, column,
term), term);
} else {
out.println(" - ", term);
}
startWith = term;
}
* if (terms.isEmpty() || (terms.size() == 1 &&
terms.get(0).equals(startWith))) {*
* break;*
* }*
}
-- Tom
On Mon, Oct 6, 2014 at 8:37 AM, Tom Hood <[email protected]> wrote:
> Hi Tim,
>
> Sure, I can try building the trunk version and see if that fixes it.
>
> However, we are using 0.2.3 at work. Do you recall if there was a
> workaround for the issue? It's not a big issue, but if there is a
> workaround, I'll use it.
>
> Thanks,
> -- Tom
>
>
> On Sun, Oct 5, 2014 at 5:03 PM, Tim Williams <[email protected]> wrote:
>
>> Hi Tom,
>> Are you comfortable trying out a trunk version? If so, I'm wondering
>> if you can reproduce this on trunk - as this seems similar to an issue
>> recently resolved.
>>
>> --tim
>>
>>
>> On Sun, Oct 5, 2014 at 4:47 PM, Tom Hood <[email protected]> wrote:
>> > Hi,
>> >
>> > I'm new to blur and have been spending a little time today learning the
>> > 0.2.3 API. I'm having trouble dumping the terms of a blur index.
>> >
>> > Here's some code that uses Iface.terms that sort of works (see below),
>> but
>> > has an issue depending on the size parameter passed to Iface.terms
>> >
>> > It wasn't obvious to me how to detect the end-of-terms condition, so if
>> > there's a cleaner way, please let me know.
>> >
>> > public static void DumpTerms(Iface blurClient, String tableName)
>> > throws BlurException, TException
>> > {
>> > Schema schema = blurClient.schema(tableName);
>> > for (Map<String,ColumnDefinition> familyDef :
>> > schema.getFamilies().values()) {
>> > for (ColumnDefinition columnDef : familyDef.values()) {
>> > DumpTermsForColumn(blurClient, tableName, columnDef);
>> > }
>> > }
>> > }
>> >
>> > public static void DumpTermsForColumn(Iface blurClient,
>> > String tableName,
>> > ColumnDefinition columnDef)
>> > throws BlurException, TException
>> > {
>> > String family = columnDef.getFamily();
>> > String column = columnDef.getColumnName();
>> > String type = columnDef.getFieldType();
>> >
>> > System.out.println(columnDef);
>> > if ( !type.equals(TextFieldTypeDefinition.NAME)
>> > && !type.equals(StringFieldTypeDefinition.NAME)) {
>> > System.out.println(" WARNING: terms unavailable for type " +
>> > type);
>> > return;
>> > }
>> >
>> > String startTerm = "";
>> > int termCount = 0;
>> > final short termFetchSize = 20;// loop logic assumes this is at
>> > least 2
>> > while (true) {
>> > List<String> terms = blurClient.terms(tableName,
>> > family,
>> > column,
>> > startTerm,
>> > termFetchSize);
>> > if ( terms.isEmpty()
>> > || (terms.size() == 1 &&
>> terms.get(0).equals(startTerm))) {
>> > return;
>> > }
>> > for (String term : terms) {
>> > if (term.equals(startTerm)) {
>> > // 1st term is startTerm on calls 2-N of
>> > blurClient.terms
>> > continue;
>> > }
>> > if (term.isEmpty()) {
>> > // empty string returned when termFetchSize > terms
>> left
>> > return;
>> > }
>> > startTerm = term;
>> > long termFreq = blurClient.recordFrequency(tableName,
>> > family,
>> > column,
>> > term);
>> > System.out.println(" term " + ++termCount
>> > + ": [" + term + "] freq=" +
>> termFreq);
>> > }
>> > }
>> > }
>> >
>> > ColumnDefinition(family:technology, columnName:author,
>> subColumnName:null,
>> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
>> > term 1: [andy] freq=1
>> > term 2: [beck] freq=1
>> > term 3: [dave] freq=1
>> > term 4: [douglas] freq=1
>> > term 5: [erik] freq=2
>> > term 6: [gospodnetic] freq=1
>> > term 7: [hatcher] freq=2
>> > term 8: [hofstadter] freq=1
>> > term 9: [howard] freq=1
>> > term 10: [hunt] freq=1
>> > term 11: [husted] freq=1
>> > term 12: [kent] freq=1
>> > term 13: [lewis] freq=1
>> > term 14: [loughran] freq=1
>> > term 15: [massol] freq=1
>> > term 16: [otis] freq=1
>> > term 17: [papert] freq=1
>> > term 18: [seymour] freq=1
>> > term 19: [ship] freq=1
>> > term 20: [steve] freq=1
>> > term 21: [ted] freq=1
>> > term 22: [thomas] freq=1
>> > term 23: [vincent] freq=1
>> > ColumnDefinition(family:technology, columnName:title,
>> subColumnName:null,
>> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
>> > term 1: [action] freq=3
>> > term 2: [an] freq=1
>> > term 3: [ant] freq=1
>> > term 4: [bach] freq=1
>> > term 5: [braid] freq=1
>> > term 6: [development] freq=1
>> > term 7: [escher] freq=1
>> > term 8: [eternal] freq=1
>> > term 9: [explained] freq=1
>> > term 10: [extreme] freq=1
>> > term 11: [g] freq=1
>> > term 12: [golden] freq=1
>> > term 13: [in] freq=3
>> > term 14: [java] freq=1
>> > term 15: [junit] freq=1
>> > term 16: [lucene] freq=1
>> > term 17: [mindstorms] freq=1
>> > term 18: [pragmatic] freq=1
>> > term 19: [programmer] freq=1
>> > term 20: [programming] freq=1
>> > term 21: [tapestry] freq=1
>> > term 22: [the] freq=1
>> > term 23: [u00f6del] freq=1
>> > term 24: [with] freq=1
>> > ColumnDefinition(family:technology, columnName:pubmonth,
>> subColumnName:null,
>> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
>> > term 1: [197903] freq=1
>> > term 2: [198001] freq=1
>> > term 3: [199910] freq=2
>> > term 4: [200208] freq=1
>> > term 5: [200310] freq=1
>> > term 6: [200403] freq=1
>> > term 7: [200406] freq=1
>> > ColumnDefinition(family:technology, columnName:subject,
>> subColumnName:null,
>> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
>> > term 1: [agile] freq=2
>> > term 2: [ant] freq=1
>> > term 3: [apache] freq=1
>> > term 4: [artificial] freq=1
>> > term 5: [build] freq=1
>> > term 6: [children] freq=1
>> > term 7: [components] freq=1
>> > term 8: [computers] freq=1
>> > term 9: [developer] freq=1
>> > term 10: [development] freq=2
>> > term 11: [driven] freq=1
>> > term 12: [education] freq=1
>> > term 13: [extreme] freq=1
>> > term 14: [ideas] freq=1
>> > term 15: [intelligence] freq=1
>> > term 16: [interface] freq=1
>> > term 17: [jakarta] freq=1
>> > term 18: [java] freq=1
>> > term 19: [junit] freq=2
>> > term 20: [logo] freq=1
>> > term 21: [lucene] freq=1
>> > term 22: [mathematics] freq=1
>> > term 23: [methodology] freq=2
>> > term 24: [mock] freq=1
>> > term 25: [music] freq=1
>> > term 26: [number] freq=1
>> > term 27: [objects] freq=1
>> > term 28: [powerful] freq=1
>> > term 29: [pragmatic] freq=1
>> > term 30: [programming] freq=1
>> > term 31: [search] freq=1
>> > term 32: [tapestry] freq=1
>> > term 33: [test] freq=1
>> > term 34: [testing] freq=1
>> > term 35: [theory] freq=1
>> > term 36: [tool] freq=1
>> > term 37: [tools] freq=1
>> > term 38: [unit] freq=1
>> > term 39: [user] freq=1
>> > ColumnDefinition(family:technology, columnName:isbn, subColumnName:null,
>> > fieldLessIndexed:false, fieldType:string, properties:null,
>> sortable:false)
>> > term 1: [020161622X] freq=1
>> > term 2: [0201616416] freq=1
>> > term 3: [0465026567] freq=1
>> > term 4: [0465046290] freq=1
>> > term 5: [1930110588] freq=1
>> > term 6: [1930110995] freq=1
>> > term 7: [1932394117] freq=1
>> > term 8: [tbd] freq=1
>> > ColumnDefinition(family:technology, columnName:url, subColumnName:null,
>> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
>> > term 1: [0201616416] freq=1
>> > term 2: [0465026567] freq=1
>> > term 3: [antbook] freq=1
>> > term 4: [detail] freq=2
>> > term 5: [exec] freq=2
>> > term 6: [http] freq=8
>> > term 7: [index.shtml] freq=1
>> > term 8: [lewisship] freq=1
>> > term 9: [lucene] freq=1
>> > term 10: [massol] freq=1
>> > term 11: [obidos] freq=2
>> > term 12: [ppbook] freq=1
>> > term 13: [tg] freq=2
>> > term 14: [www.amazon.com] freq=2
>> > term 15: [www.manning.com] freq=4
>> > term 16: [www.papert.org] freq=1
>> > term 17: [www.pragmaticprogrammer.com] freq=1
>> > Exception in thread "main" BlurException(message:Call execution
>> exception
>> > [[lia, technology, url, www.pragmaticprogrammer.com, 20]],
>> > stackTraceStr:java.lang.ArrayIndexOutOfBoundsException: 128
>> > at
>> >
>> org.apache.lucene.store.ByteArrayDataInput.readVInt(ByteArrayDataInput.java:104)
>> > at
>> >
>> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextLeaf(BlockTreeTermsReader.java:2467)
>> > at
>> >
>> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next(BlockTreeTermsReader.java:2459)
>> > at
>> >
>> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2139)
>> > at
>> >
>> org.apache.blur.index.ExitableReader$ExitableTermsEnum.next(ExitableReader.java:233)
>> > at org.apache.blur.manager.IndexManager.terms(IndexManager.java:1031)
>> > at org.apache.blur.manager.IndexManager$9.call(IndexManager.java:982)
>> > at org.apache.blur.manager.IndexManager$9.call(IndexManager.java:976)
>> > at org.apache.blur.utils.ForkJoin$2.call(ForkJoin.java:63)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> >
>> org.apache.blur.concurrent.ThreadWatcher$ThreadWatcherExecutorService$1.run(ThreadWatcher.java:127)
>> > at
>> >
>> org.apache.blur.concurrent.BlurThreadPoolExecutor$1.run(BlurThreadPoolExecutor.java:83)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> > at java.lang.Thread.run(Thread.java:662)
>> > , errorType:UNKNOWN)
>> > at
>> >
>> org.apache.blur.thrift.generated.Blur$terms_result$terms_resultStandardScheme.read(Blur.java:26728)
>> > at
>> >
>> org.apache.blur.thrift.generated.Blur$terms_result$terms_resultStandardScheme.read(Blur.java:26696)
>> > at
>> org.apache.blur.thrift.generated.Blur$terms_result.read(Blur.java:26638)
>> > at
>> >
>> org.apache.blur.thirdparty.thrift_0_9_0.TServiceClient.receiveBase(TServiceClient.java:78)
>> > at
>> org.apache.blur.thrift.generated.Blur$Client.recv_terms(Blur.java:1212)
>> > at
>> >
>> org.apache.blur.thrift.generated.SafeClientGen.recv_terms(SafeClientGen.java:508)
>> > at org.apache.blur.thrift.generated.Blur$Client.terms(Blur.java:1195)
>> > at
>> >
>> org.apache.blur.thrift.generated.SafeClientGen.terms(SafeClientGen.java:942)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > at
>> >
>> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler$1.call(BlurClient.java:60)
>> > at
>> >
>> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler$1.call(BlurClient.java:56)
>> > at org.apache.blur.thrift.AbstractCommand.call(AbstractCommand.java:62)
>> > at
>> >
>> org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:197)
>> > at
>> >
>> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:56)
>> > at com.sun.proxy.$Proxy0.terms(Unknown Source)
>> > at
>> >
>> hoodware.sandbox.blur.BlurIndexMain.DumpTermsForColumn(BlurIndexMain.java:88)
>> > at hoodware.sandbox.blur.BlurIndexMain.DumpTerms(BlurIndexMain.java:64)
>> > at hoodware.sandbox.blur.BlurIndexMain.main(BlurIndexMain.java:38)
>> >
>> > The code works if I change termFetchSize to 2 instead of 20.
>> >
>> > The command "blur terms lia technology.url" will get the same exception.
>> >
>> > The command "blur terms lia technology.url -s2" will not get the
>> exception,
>> > but goes into an infinite loop after it outputs: "-
>> > |www.pragmaticprogrammer.com "
>> >
>> > Attached is the csv file that I loaded into an empty table. It's a
>> > reformatted version of the Lucene In Action book's sample data (taken
>> from
>> > data directory in
>> > http://www.manning-source.com/books/hatcher2/LuceneInAction.zip)
>> >
>> > I created the table with the commands:
>> >
>> > hadoop fs -mkdir lia_input
>> > hadoop fs -copyFromLocal ~/projects/lucene/LuceneInAction.csv lia_input
>> > hadoop fs -mkdir tables
>> > blur create -t lia -c 2 -l tables/lia
>> >
>> > foreach family (health technology philosophy education)
>> > blur definecolumn lia $family title text
>> > blur definecolumn lia $family isbn string
>> > blur definecolumn lia $family author text
>> > # blur definecolumn lia $family pubmonth date -p dateFormat yyyyMM
>> > blur definecolumn lia $family pubmonth text # must be text for
>> > Blur.Iface.terms
>> > blur definecolumn lia $family subject text
>> > blur definecolumn lia $family url text
>> > end
>> >
>> > blur csvloader -c localhost:40010 -A -a -t lia -i lia_input -s';' \
>> > -d 'health title isbn author pubmonth subject url' \
>> > -d 'technology title isbn author pubmonth subject url' \
>> > -d 'philosophy title isbn author pubmonth subject url' \
>> > -d 'education title isbn author pubmonth subject url'
>> >
>> > Please let me know if you have any ideas on what I'm doing wrong.
>> >
>> > Thanks,
>> > -- Tom
>> >
>> >
>>
>
>