Re: A question about FacetField constructor
What do you mean by does not index anything? Do you get an exception when you add a String[] with more than one element? You should probably call conf.setHierarchical(dimension), but if you don't do that you should receive an IllegalArgumentException telling you to do that... Shai On Sun, Jun 22, 2014 at 6:34 AM, west suhanic west.suha...@gmail.com wrote: Hello All: I am building sample code using lucene v4.8.1 to explore the new facet API. The problem I am having is that if I pass a populated string array nothing gets indexed while if I pass only the first element of the string array that value gets indexed. The code found below shows the case that works and the case that does not work. What am I doing wrong? Start of code sample* void showStuff( String... va ) { /** This code permits out the contents of va successfully.**/ for( int ii = 0 ; ii va.length ; ii++ ) System.out.println( value[ + ii + ] + va[ii] ); } for( final Map String, String[] fd : allFacetData ) { final Document doc = new Document(); for( final Map.Entry String, String[] entry : fd.entrySet() ) { final String key = entry.getKey(); String[] value = entry.getValue(); showStuff( value ); /** This call indexes successfully **/ final FacetField newFF = new FacetField( key, value[0] ); /** * This call will not index anything if the value String array * has more than one element. *final FacetField newFF = new FacetField( key, value ); */ doc.add( newFF ); } try { final Document theBuildDoc = configFacetsHandle. build( taxoWriter, doc ); indexWriter.addDocument( theBuildDoc ); indexWriter.addDocument( configFacetsHandle.buil d( taxoWriter, doc ) ); } catch( IOException ioe ) { eMsg.append( method ); eMsg.append( failed with the exception ); eMsg.append( ioe.toString() ); return constantValuesInterface.FAILURE; } } ***End of code sample*** regards, West Suhanic
AW: fuzzy/case insensitive AnalyzingSuggester )
Oli, thanks for your valuable inputs! Generally, we found it beneficial to not combine all functionality in a single suggester Makes absolutely sense, but doesn't help keeping RAM-load low ;) unless you go with WFSTs. What we have done so far is build a term-index based on the terms of the corresponding (data)index. I.e. an index always comes in pair with its corresponding term index. -Ursprüngliche Nachricht- Von: Oliver Christ [mailto:ochr...@ebsco.com] Gesendet: Freitag, 20. Juni 2014 15:52 An: java-user@lucene.apache.org Betreff: RE: fuzzy/case insensitive AnalyzingSuggester ) Hi Clemens, I haven't yet built a suggester which combines all three, and am not aware of one. I'd love to have one though ;-) Case- and diacritics insensitivity is supported out-of-the-box by the analyzing suggesters, including the FuzzySuggester. The logic is in the Analyzer. I haven't yet tried out AnalyzingInfixSuggester, and haven't investigated whether it's possible to combine that with FuzzySuggester (which also is an analyzing suggester). Due to memory constraints, we build infix suggesters by adding each relevant substring, but use WFST suggesters with payloads as the base, to reduce RAM load at runtime. We call the analyzer in the dictionary iterator. At search time, we look up the surface form (completion) in a secondary index using the payload as a key (and for deduping). If FuzzySuggester supports payloads (haven't checked), you could get an infix suggester using the same approach. That will lead to large automata, and as you'd have to look up the completion in a secondary index, you'd never use the surface form returned by the automaton itself, so it's a waste of space. WFSTs are more space-efficient but don't support payloads (if I remember correctly) and there's no fuzzy WFST suggester either :( Generally, we found it beneficial to not combine all functionality in a single suggester, but use separate automata in a cascaded model. We first look up completions in the prefix non-fuzzy suggester. Based on several criteria, we may then consult the infix suggester, and if needed, the fuzzy suggester. The rationale is that we don't want high-ranking fuzzy or infix hits to fill up the completion list while there are good (but less popular) prefix hits. Having control over which suggester is used when, and how its specific suggestions are merged into the final result list, helps improving the user experience, at least with our use cases. Cheers, Oli -Original Message- From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Sent: Friday, June 20, 2014 6:47 AM To: java-user@lucene.apache.org Subject: AW: fuzzy/case insensitive AnalyzingSuggester ) Sorry for re-asking. Has anyone implemented an AnalyzingSuggester which - is fuzzy - is case insensitive (or must/should this be implemented by the analyzer?) - does infix search [- has a small memory footprint] -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Freitag, 13. Juni 2014 14:53 An: java-user@lucene.apache.org Betreff: fuzzy/case insensitive AnalyzingSuggester ) Looking for an AnalyzingSuggester which supports - fuzzyness - case insensitivity - small (in memors) footprint (*) (*)Just tried to hand my big IndexReader (see oher post [lucene 4.6] NPE when calling IndexReader#openIfChanged) into JaspellLookup. Got an OOM. Is there any (Jaspell)Lookup implementation that can handle really big indexes (by swapping out part of the lookup-table)? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org B�CB��[��X��ܚX�KK[XZ[ ��]�K]\�\�][��X��ܚX�PX�[�K�\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[ ��]�K]\�\�Z[X�[�K�\X�K�ܙ�B�B
Re: Lucene Facets Module 4.8.1
I will try to dig more on your suggestions, and also assert FacetsConfig object. While debugging i found, buildFacetsResult(...) method from DrillSideways.java Its internally invoking following constructor from FastTaxonomyFacetCounts.java FastTaxonomyFacetCounts() { this(FacetsConfig.DEFAULT_INDEX_FIELD_NAME, taxoReader, config, fc); // FacetsConfig.DEFAULT_INDEX_FIELD_NAME is '$facets' } Shouldn't it invoke following constructor with correct indexFieldName ? In my case indexFieldName as 'city' which has dimension 'CITY'. FastTaxonomyFacetCounts(String indexFieldName, TaxonomyReader taxoReader, FacetsConfig config, FacetsCollector fc) throws IOException { super(indexFieldName, taxoReader, config); ... } Thanks Jigar Shah. On Sat, Jun 21, 2014 at 11:01 PM, Shai Erera ser...@gmail.com wrote: If you can, while in debug mode try to note the instance ID of the FacetsConfig, and assert it is indeed the same (i.e. indexConfig == searchConfig). Shai On Sat, Jun 21, 2014 at 8:26 PM, Michael McCandless luc...@mikemccandless.com wrote: Are you sure it's the same FacetsConfig at search time? Because the exception implies your CITY field didn't have config.setIndexFieldName(CITY, city) called. Or, can you try commenting out 'config.setIndexFieldName(CITY, city)' at index time and see if the exception still happens? Mike McCandless http://blog.mikemccandless.com On Sat, Jun 21, 2014 at 1:08 AM, Jigar Shah jigaronl...@gmail.com wrote: Thanks for helping me. Yes, i did couple of things: Below is simple code for indexing which i use. TrackingIndexWriter nrtWriter DirectoryTaxonomyWriter taxoWriter = ... FacetsConfig config = new FacetConfig(); config.setHierarchical(CITY, true) config.setMultiValued(CITY, true); config.setIndexFieldName(CITY,city) // I kept dimName different from indexFieldName Added indexing searchable fields... doc.add( new FacetField(CITY, India, Gujarat, Vadodara )) doc.add( new FacetField(CITY, India, Gujarat, Ahmedabad )) nrtWriter.addDocument(config.build(taxoWriter, doc)); Below is code which i use for searching TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter); Query query = ... IndexSearcher searcher = ... DrillDownQuery ddq = new DrillDownQuery(config, query); DrillSideways ds = new DrillSideways(searcher, config, taxoReader); // Config object is same which i created before DrillSidewaysResult result = ds.search(query, null, null, start + limit, null, true, true) ... Facets f = result.facets FacetResult fr = f.getTopChildren(5, CITY) [Exception is geneated]// Didn't perform any drill-down,really, its just original query for first time, but wrapped in DrillDownQuery. ... and below gives me empty collection. ListFacetResult frs= f.getAllDims(5) I debug source code and found, it internally calls FastTaxonomyFacetCounts(indexFieldName, taxoReader, config) // Config object is same which i created before which then calls IntTaxonomyFacets(indexFieldName, taxoReader, config) // Config object is same which i created before And during this calls the value of indexFieldName is $facets defined by constant 'public static final String DEFAULT_INDEX_FIELD_NAME = $facets;' in FacetsConfig. My question is if i am using same FacetsConfig while indexing and searching. why its not identifying correct name of field, and goes for $facets Please correct me if i understood wrong. or correct way to solve above problem. Many Thanks. Jigar Shah. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: EarlyTerminatingSortingCollector help needed..
Thanks for your reply clarifications What do you mean by When I use a SortField instead? Unless you are using early termination, Collector.collect is supposed to be called for every matching document For a normal sorting-query, on a top-level searcher, I execute TopDocs docs = searcher.search(query, 50, sortField) Then I can issue reader.document() for final list of exactly 50 docs, which gives me a global order across segments but at the obvious cost of memory... SortingMergePolicy + ETSC will make me do 50*N [N=no.of.segments] collects, which could increase cost of seeks when each segment collects considerable hits... - you can afford the merging overhead (ie. for heavy indexing workloads, this might not be the best solution) - there is a single sort order that is used for most queries - you don't need any feature that requires to collect all documents (like computing the total hit count or facets). Our use-case fits perfectly on all these 3 points and thats why we wanted to explore this. But our final set of results must also be globally ordered. May be it's mistake to assume that Sorting can be entirely replaced with SMP + ETSC... I would not advise to use the stored fields API, even in the context of early termination. Doc values should be more efficient here? I read your excellent blog on stored-fields compression, where you've mentioned that stored-fields now take only one random seek. [ http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1 ] If so, then what could make DocValues still a winner? -- Ravi On Sat, Jun 21, 2014 at 6:41 PM, Adrien Grand jpou...@gmail.com wrote: Hi Ravikumar, On Fri, Jun 20, 2014 at 12:14 PM, Ravikumar Govindarajan ravikumar.govindara...@gmail.com wrote: If my numDocsToCollect = 50 and no.of. segments = 15, then collector.collect() will be called 750 times. That is the worst-case indeed. However if some of your segments have less than 50 matches, `collect` will only be called on those matches. When I use a SortField instead, then TopFieldDocs does the sorting for all segments and collector.collect() will be called only 50 times... What do you mean by When I use a SortField instead? Unless you are using early termination, Collector.collect is supposed to be called for every matching document. Assuming a stored-field seek for every collector.collect(), will it be advisable to still persist with ETSC? Was it introduced as a trade-off b/n memory disk? I would not advise to use the stored fields API, even in the context of early termination. Doc values should be more efficient here? The trade-off is not really about memory and disk. What it tries to achieve is to make queries much faster provided that: - you can afford the merging overhead (ie. for heavy indexing workloads, this might not be the best solution) - there is a single sort order that is used for most queries - you don't need any feature that requires to collect all documents (like computing the total hit count or facets). -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: A question about FacetField constructor
Hello: What do you mean by does not index anything? When I do a search the value returned for the dim set to Publish Date is null. If I pass through value[0] the publish date year is returned by the search. setHierarchical was called. When a String[] with more than one element is passed an exception is not thrown. I am open to all suggestions as to what I am missing. regards, west suhanic On Sun, Jun 22, 2014 at 3:23 AM, Shai Erera ser...@gmail.com wrote: What do you mean by does not index anything? Do you get an exception when you add a String[] with more than one element? You should probably call conf.setHierarchical(dimension), but if you don't do that you should receive an IllegalArgumentException telling you to do that... Shai On Sun, Jun 22, 2014 at 6:34 AM, west suhanic west.suha...@gmail.com wrote: Hello All: I am building sample code using lucene v4.8.1 to explore the new facet API. The problem I am having is that if I pass a populated string array nothing gets indexed while if I pass only the first element of the string array that value gets indexed. The code found below shows the case that works and the case that does not work. What am I doing wrong? Start of code sample* void showStuff( String... va ) { /** This code permits out the contents of va successfully.**/ for( int ii = 0 ; ii va.length ; ii++ ) System.out.println( value[ + ii + ] + va[ii] ); } for( final Map String, String[] fd : allFacetData ) { final Document doc = new Document(); for( final Map.Entry String, String[] entry : fd.entrySet() ) { final String key = entry.getKey(); String[] value = entry.getValue(); showStuff( value ); /** This call indexes successfully **/ final FacetField newFF = new FacetField( key, value[0] ); /** * This call will not index anything if the value String array * has more than one element. *final FacetField newFF = new FacetField( key, value ); */ doc.add( newFF ); } try { final Document theBuildDoc = configFacetsHandle. build( taxoWriter, doc ); indexWriter.addDocument( theBuildDoc ); indexWriter.addDocument( configFacetsHandle.buil d( taxoWriter, doc ) ); } catch( IOException ioe ) { eMsg.append( method ); eMsg.append( failed with the exception ); eMsg.append( ioe.toString() ); return constantValuesInterface.FAILURE; } } ***End of code sample*** regards, West Suhanic
Re: A question about FacetField constructor
Reply wasn't sent to the list. On Jun 22, 2014 8:15 PM, Shai Erera ser...@gmail.com wrote: Can you post an example which demonstrates the problem? It's also interesting how you count the facets, eg do you use a TaxonomyFacets object or something else? Have you looked at the facet demo code? It contains examples for using hierarchical facets. Shai On Jun 22, 2014 8:08 PM, west suhanic west.suha...@gmail.com wrote: Hello: What do you mean by does not index anything? When I do a search the value returned for the dim set to Publish Date is null. If I pass through value[0] the publish date year is returned by the search. setHierarchical was called. When a String[] with more than one element is passed an exception is not thrown. I am open to all suggestions as to what I am missing. regards, west suhanic On Sun, Jun 22, 2014 at 3:23 AM, Shai Erera ser...@gmail.com wrote: What do you mean by does not index anything? Do you get an exception when you add a String[] with more than one element? You should probably call conf.setHierarchical(dimension), but if you don't do that you should receive an IllegalArgumentException telling you to do that... Shai On Sun, Jun 22, 2014 at 6:34 AM, west suhanic west.suha...@gmail.com wrote: Hello All: I am building sample code using lucene v4.8.1 to explore the new facet API. The problem I am having is that if I pass a populated string array nothing gets indexed while if I pass only the first element of the string array that value gets indexed. The code found below shows the case that works and the case that does not work. What am I doing wrong? Start of code sample* void showStuff( String... va ) { /** This code permits out the contents of va successfully.**/ for( int ii = 0 ; ii va.length ; ii++ ) System.out.println( value[ + ii + ] + va[ii] ); } for( final Map String, String[] fd : allFacetData ) { final Document doc = new Document(); for( final Map.Entry String, String[] entry : fd.entrySet() ) { final String key = entry.getKey(); String[] value = entry.getValue(); showStuff( value ); /** This call indexes successfully **/ final FacetField newFF = new FacetField( key, value[0] ); /** * This call will not index anything if the value String array * has more than one element. *final FacetField newFF = new FacetField( key, value ); */ doc.add( newFF ); } try { final Document theBuildDoc = configFacetsHandle. build( taxoWriter, doc ); indexWriter.addDocument( theBuildDoc ); indexWriter.addDocument( configFacetsHandle.buil d( taxoWriter, doc ) ); } catch( IOException ioe ) { eMsg.append( method ); eMsg.append( failed with the exception ); eMsg.append( ioe.toString() ); return constantValuesInterface.FAILURE; } } ***End of code sample*** regards, West Suhanic