Re: A question about FacetField constructor

2014-06-22 Thread Shai Erera
What do you mean by does not index anything? Do you get an exception when
you add a String[] with more than one element?

You should probably call conf.setHierarchical(dimension), but if you don't
do that you should receive an IllegalArgumentException telling you to do
that...

Shai


On Sun, Jun 22, 2014 at 6:34 AM, west suhanic west.suha...@gmail.com
wrote:

 Hello All:

 I am building sample code using lucene v4.8.1 to explore
 the new facet API. The problem I am having is that if I pass
 a populated string array nothing gets indexed while if
 I pass only the first element of the string array that value gets indexed.
 The code found below shows the case that works and the case that does not
 work. What am I doing wrong?

 Start of code sample*

 void showStuff( String... va )
 {
   /** This code permits out the contents of va successfully.**/
   for( int ii = 0 ; ii  va.length ; ii++ )
   System.out.println( value[ + ii + ]  + va[ii] );
 }

 for( final Map String, String[]  fd : allFacetData )
 {

 final Document doc = new Document();
 for( final Map.Entry String, String[]  entry :
 fd.entrySet() )
 {
 final String key = entry.getKey();
 String[] value = entry.getValue();
 showStuff( value );

 /**  This call indexes successfully **/
 final FacetField newFF = new FacetField(
 key, value[0] );

 /**
* This call will not index anything if
 the value String array
* has more than one element.
*final FacetField newFF = new
 FacetField( key, value );
*/
 doc.add( newFF );
 }

 try
 {
 final Document theBuildDoc =
 configFacetsHandle.
 build( taxoWriter, doc );
 indexWriter.addDocument( theBuildDoc );
 indexWriter.addDocument(
 configFacetsHandle.buil
 d( taxoWriter, doc ) );
 }
 catch( IOException ioe )
 {
 eMsg.append( method );
 eMsg.append(   failed with the exception 
 );
 eMsg.append( ioe.toString() );
 return constantValuesInterface.FAILURE;
 }
 }

 ***End of code sample***

 regards,

 West Suhanic



AW: fuzzy/case insensitive AnalyzingSuggester )

2014-06-22 Thread Clemens Wyss DEV
Oli, 
thanks for your valuable inputs!

 Generally, we found it beneficial to not combine all functionality in a 
 single suggester
Makes absolutely sense, but doesn't help keeping RAM-load low ;) unless you go 
with WFSTs. 

What we have done so far is build a term-index based on the terms of the 
corresponding (data)index. I.e. an index always comes in pair with its 
corresponding term index.

-Ursprüngliche Nachricht-
Von: Oliver Christ [mailto:ochr...@ebsco.com] 
Gesendet: Freitag, 20. Juni 2014 15:52
An: java-user@lucene.apache.org
Betreff: RE: fuzzy/case insensitive AnalyzingSuggester )

Hi Clemens,

I haven't yet built a suggester which combines all three, and am not aware of 
one. I'd love to have one though ;-)

Case- and diacritics insensitivity is supported out-of-the-box by the analyzing 
suggesters, including the FuzzySuggester. The logic is in the Analyzer.

I haven't yet tried out AnalyzingInfixSuggester, and haven't investigated 
whether it's possible to combine that with FuzzySuggester (which also is an 
analyzing suggester).

Due to memory constraints, we build infix suggesters by adding each relevant 
substring, but use WFST suggesters with payloads as the base, to reduce RAM 
load at runtime. We call the analyzer in the dictionary iterator. At search 
time, we look up the surface form (completion) in a secondary index using the 
payload as a key (and for deduping).

If FuzzySuggester supports payloads (haven't checked), you could get an infix 
suggester using the same approach. That will lead to large automata, and as 
you'd have to look up the completion in a secondary index, you'd never use the 
surface form returned by the automaton itself, so it's a waste of space. WFSTs 
are more space-efficient but don't support payloads (if I remember correctly) 
and there's no fuzzy WFST suggester either :(

Generally, we found it beneficial to not combine all functionality in a single 
suggester, but use separate automata in a cascaded model. We first look up 
completions in the prefix non-fuzzy suggester. Based on several criteria, we 
may then consult the infix suggester, and if needed, the fuzzy suggester. The 
rationale is that we don't want high-ranking fuzzy or infix hits to fill up the 
completion list while there are good (but less popular) prefix hits. Having 
control over which suggester is used when, and how its specific suggestions are 
merged into the final result list, helps improving the user experience, at 
least with our use cases.

Cheers, Oli

-Original Message-
From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Sent: Friday, June 20, 2014 6:47 AM
To: java-user@lucene.apache.org
Subject: AW: fuzzy/case insensitive AnalyzingSuggester )

Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to hand my big IndexReader (see oher post  [lucene 4.6] NPE 
when calling IndexReader#openIfChanged) into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes 
(by swapping  out part of the lookup-table)?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

B�CB��[��X��ܚX�KK[XZ[
��]�K]\�\�][��X��ܚX�PX�[�K�\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[
��]�K]\�\�Z[X�[�K�\X�K�ܙ�B�B


Re: Lucene Facets Module 4.8.1

2014-06-22 Thread Jigar Shah
I will try to dig more on your suggestions, and also assert FacetsConfig
object.

While debugging i found, buildFacetsResult(...) method from
DrillSideways.java

Its internally invoking following constructor from
FastTaxonomyFacetCounts.java

FastTaxonomyFacetCounts() {
this(FacetsConfig.DEFAULT_INDEX_FIELD_NAME, taxoReader, config, fc); //
FacetsConfig.DEFAULT_INDEX_FIELD_NAME is '$facets'
}

Shouldn't it invoke following constructor with correct indexFieldName ? In
my case indexFieldName as 'city' which has dimension 'CITY'.

 FastTaxonomyFacetCounts(String indexFieldName, TaxonomyReader taxoReader,
FacetsConfig config, FacetsCollector fc) throws IOException {
super(indexFieldName, taxoReader, config);
...
}

Thanks
Jigar Shah.



On Sat, Jun 21, 2014 at 11:01 PM, Shai Erera ser...@gmail.com wrote:

 If you can, while in debug mode try to note the instance ID of the
 FacetsConfig, and assert it is indeed the same (i.e. indexConfig ==
 searchConfig).

 Shai


 On Sat, Jun 21, 2014 at 8:26 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:

  Are you sure it's the same FacetsConfig at search time?  Because the
  exception implies your CITY field didn't have
  config.setIndexFieldName(CITY, city) called.
 
  Or, can you try commenting out 'config.setIndexFieldName(CITY,
  city)' at index time and see if the exception still happens?
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Sat, Jun 21, 2014 at 1:08 AM, Jigar Shah jigaronl...@gmail.com
 wrote:
   Thanks for helping me.
  
   Yes, i did couple of things:
  
   Below is simple code for indexing which i use.
  
   TrackingIndexWriter nrtWriter
   DirectoryTaxonomyWriter taxoWriter = ...
   
   FacetsConfig config = new FacetConfig();
   config.setHierarchical(CITY, true)
   config.setMultiValued(CITY, true);
   config.setIndexFieldName(CITY,city) // I kept dimName different
 from
   indexFieldName
   
   Added indexing searchable fields...
   
  
   doc.add( new FacetField(CITY, India, Gujarat, Vadodara ))
   doc.add( new FacetField(CITY, India, Gujarat, Ahmedabad ))
  
nrtWriter.addDocument(config.build(taxoWriter, doc));
  
   Below is code which i use for searching
  
   TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter);
  
   Query query = ...
   IndexSearcher searcher = ...
   DrillDownQuery ddq = new DrillDownQuery(config, query);
   DrillSideways ds = new DrillSideways(searcher, config, taxoReader); //
   Config object is same which i created before
   DrillSidewaysResult result = ds.search(query, null, null, start +
 limit,
   null, true, true)
   ...
   Facets f = result.facets
   FacetResult fr = f.getTopChildren(5, CITY) [Exception is geneated]//
   Didn't perform any drill-down,really, its just original query for first
   time, but wrapped in DrillDownQuery.
  
   ... and below gives me empty collection.
  
   ListFacetResult frs= f.getAllDims(5)
  
   I debug source code and found, it internally calls
  
   FastTaxonomyFacetCounts(indexFieldName, taxoReader, config) // Config
   object is same which i created before
  
   which then calls
  
   IntTaxonomyFacets(indexFieldName, taxoReader, config) // Config object
 is
   same which i created before
  
   And during this calls the value of indexFieldName is $facets defined
 by
   constant  'public static final String DEFAULT_INDEX_FIELD_NAME =
  $facets;'
   in FacetsConfig.
  
   My question is if i am using same FacetsConfig while indexing and
   searching. why its not identifying correct name of field, and goes for
   $facets
  
   Please correct me if i understood wrong. or correct way to solve above
   problem.
  
   Many Thanks.
   Jigar Shah.
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 



Re: EarlyTerminatingSortingCollector help needed..

2014-06-22 Thread Ravikumar Govindarajan
Thanks for your reply  clarifications

What do you mean by When I use a SortField instead? Unless you are
 using early termination, Collector.collect is supposed to be called
 for every matching document



For a normal sorting-query, on a top-level searcher, I execute

TopDocs docs = searcher.search(query, 50, sortField)

Then I can issue reader.document() for final list of exactly 50 docs, which
gives me a global order across segments but at the obvious cost of memory...

SortingMergePolicy + ETSC will make me do 50*N [N=no.of.segments] collects,
which could increase cost of seeks when each segment collects considerable
hits...

 - you can afford the merging overhead (ie. for heavy indexing
 workloads, this might not be the best solution)
  - there is a single sort order that is used for most queries
  - you don't need any feature that requires to collect all documents
 (like computing the total hit count or facets).


Our use-case fits perfectly on all these 3 points and thats why we wanted
to explore this. But our final set of results must also be globally
ordered. May be it's mistake to assume that Sorting can be entirely
replaced with SMP + ETSC...

I would not advise to use the stored fields API, even in the context
 of early termination. Doc values should be more efficient here?


I read your excellent blog on stored-fields compression, where you've
mentioned that stored-fields now take only one random seek. [
http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1
]

If so, then what could make DocValues still a winner?

--
Ravi


On Sat, Jun 21, 2014 at 6:41 PM, Adrien Grand jpou...@gmail.com wrote:

 Hi Ravikumar,

 On Fri, Jun 20, 2014 at 12:14 PM, Ravikumar Govindarajan
 ravikumar.govindara...@gmail.com wrote:
  If my numDocsToCollect = 50 and no.of. segments = 15, then
  collector.collect() will be called 750 times.

 That is the worst-case indeed. However if some of your segments have
 less than 50 matches, `collect` will only be called on those matches.

  When I use a SortField instead, then TopFieldDocs does the sorting for
 all
  segments and collector.collect() will be called only 50 times...

 What do you mean by When I use a SortField instead? Unless you are
 using early termination, Collector.collect is supposed to be called
 for every matching document.

  Assuming a stored-field seek for every collector.collect(), will it be
  advisable to still persist with ETSC? Was it introduced as a trade-off
 b/n
  memory  disk?

 I would not advise to use the stored fields API, even in the context
 of early termination. Doc values should be more efficient here?

 The trade-off is not really about memory and disk. What it tries to
 achieve is to make queries much faster provided that:
  - you can afford the merging overhead (ie. for heavy indexing
 workloads, this might not be the best solution)
  - there is a single sort order that is used for most queries
  - you don't need any feature that requires to collect all documents
 (like computing the total hit count or facets).

 --
 Adrien

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: A question about FacetField constructor

2014-06-22 Thread west suhanic
Hello:

What do you mean by does not index anything?

When I do a search the value returned for the dim set to Publish Date
is null. If I pass through value[0] the publish date year is returned by
the search.

setHierarchical was called.

When a String[] with more than one element is passed an exception is not
thrown.

I am open to all suggestions as to what I am missing.

regards,

west suhanic


On Sun, Jun 22, 2014 at 3:23 AM, Shai Erera ser...@gmail.com wrote:

 What do you mean by does not index anything? Do you get an exception when
 you add a String[] with more than one element?

 You should probably call conf.setHierarchical(dimension), but if you don't
 do that you should receive an IllegalArgumentException telling you to do
 that...

 Shai


 On Sun, Jun 22, 2014 at 6:34 AM, west suhanic west.suha...@gmail.com
 wrote:

 Hello All:

 I am building sample code using lucene v4.8.1 to explore
 the new facet API. The problem I am having is that if I pass
 a populated string array nothing gets indexed while if
 I pass only the first element of the string array that value gets indexed.
 The code found below shows the case that works and the case that does not
 work. What am I doing wrong?

 Start of code sample*

 void showStuff( String... va )
 {
   /** This code permits out the contents of va
 successfully.**/
   for( int ii = 0 ; ii  va.length ; ii++ )
   System.out.println( value[ + ii + ]  + va[ii] );
 }

 for( final Map String, String[]  fd : allFacetData )
 {

 final Document doc = new Document();
 for( final Map.Entry String, String[]  entry :
 fd.entrySet() )
 {
 final String key = entry.getKey();
 String[] value = entry.getValue();
 showStuff( value );

 /**  This call indexes successfully **/
 final FacetField newFF = new FacetField(
 key, value[0] );

 /**
* This call will not index anything if
 the value String array
* has more than one element.
*final FacetField newFF = new
 FacetField( key, value );
*/
 doc.add( newFF );
 }

 try
 {
 final Document theBuildDoc =
 configFacetsHandle.
 build( taxoWriter, doc );
 indexWriter.addDocument( theBuildDoc );
 indexWriter.addDocument(
 configFacetsHandle.buil
 d( taxoWriter, doc ) );
 }
 catch( IOException ioe )
 {
 eMsg.append( method );
 eMsg.append(   failed with the exception
 
 );
 eMsg.append( ioe.toString() );
 return constantValuesInterface.FAILURE;
 }
 }

 ***End of code sample***

 regards,

 West Suhanic





Re: A question about FacetField constructor

2014-06-22 Thread Shai Erera
Reply wasn't sent to the list.
On Jun 22, 2014 8:15 PM, Shai Erera ser...@gmail.com wrote:

 Can you post an example which demonstrates the problem? It's also
 interesting how you count the facets, eg do you use a TaxonomyFacets object
 or something else?

 Have you looked at the facet demo code? It contains examples for using
 hierarchical facets.

 Shai
 On Jun 22, 2014 8:08 PM, west suhanic west.suha...@gmail.com wrote:

 Hello:

 What do you mean by does not index anything?

 When I do a search the value returned for the dim set to Publish Date
 is null. If I pass through value[0] the publish date year is returned by
 the search.

 setHierarchical was called.

 When a String[] with more than one element is passed an exception is not
 thrown.

 I am open to all suggestions as to what I am missing.

 regards,

 west suhanic


 On Sun, Jun 22, 2014 at 3:23 AM, Shai Erera ser...@gmail.com wrote:

 What do you mean by does not index anything? Do you get an exception
 when you add a String[] with more than one element?

 You should probably call conf.setHierarchical(dimension), but if you
 don't do that you should receive an IllegalArgumentException telling you to
 do that...

 Shai


 On Sun, Jun 22, 2014 at 6:34 AM, west suhanic west.suha...@gmail.com
 wrote:

 Hello All:

 I am building sample code using lucene v4.8.1 to explore
 the new facet API. The problem I am having is that if I pass
 a populated string array nothing gets indexed while if
 I pass only the first element of the string array that value gets
 indexed.
 The code found below shows the case that works and the case that does
 not
 work. What am I doing wrong?

 Start of code sample*

 void showStuff( String... va )
 {
   /** This code permits out the contents of va
 successfully.**/
   for( int ii = 0 ; ii  va.length ; ii++ )
   System.out.println( value[ + ii + ]  + va[ii]
 );
 }

 for( final Map String, String[]  fd : allFacetData )
 {

 final Document doc = new Document();
 for( final Map.Entry String, String[]  entry :
 fd.entrySet() )
 {
 final String key = entry.getKey();
 String[] value = entry.getValue();
 showStuff( value );

 /**  This call indexes successfully **/
 final FacetField newFF = new FacetField(
 key, value[0] );

 /**
* This call will not index anything
 if
 the value String array
* has more than one element.
*final FacetField newFF = new
 FacetField( key, value );
*/
 doc.add( newFF );
 }

 try
 {
 final Document theBuildDoc =
 configFacetsHandle.
 build( taxoWriter, doc );
 indexWriter.addDocument( theBuildDoc );
 indexWriter.addDocument(
 configFacetsHandle.buil
 d( taxoWriter, doc ) );
 }
 catch( IOException ioe )
 {
 eMsg.append( method );
 eMsg.append(   failed with the
 exception 
 );
 eMsg.append( ioe.toString() );
 return constantValuesInterface.FAILURE;
 }
 }

 ***End of code sample***

 regards,

 West Suhanic