DrillSideways accepting FacetCollector parameter

2014-07-08 Thread Jigar Shah
Currently Drillsideways provides following method:

public DrillSidewaysResult search(DrillDownQuery query, Collector
hitCollector);

Can same class provide following method ?

public DrillSidewaysResult search(DrillDownQuery query, Collector
hitCollector, FacetsCollector facetCollector);

Currently,

 FacetsCollector drillDownCollector = new FacetsCollector();

is created from API method

public DrillSidewaysResult search(DrillDownQuery query, Collector
hitCollector) throws IOException

 which can be parametrised ?

It will help application to use same FacetsCollector to fetch other facets,
i.e. non sideways facets.

Thanks,
Jigar Shah.


re-use IndexWriter

2014-07-08 Thread Jason.H
nowadays , i've been trying every way to improve the performance of indexing , 
IndexWriter's close operation is really costly , and the Lucene's doc sugguest 
to re-use IndexWriter instance , i  did it , i  kept the indexWriter instance , 
and give it back to every request thread , But there comes a big problem ,  i 
never search the index changes because the index changes is till in the RAM , 
maybe there's a way to flush all the changes to the stable Storage and this 
operation don't close the IndexWriter so i could re-use it  . am i right at 
this point ? 

there're several point i don't quite understand ..

1, what's the difference between commit and flush  ?   i thought with these two 
method , i could see the changes in my Directory without closing IndexWriter .

2, when should i close the writer ? if i use it Singleton(i don't have to worry 
about the LockObtainException) , and i don't have to worry about the changes 
because commit and flush would do this , then i don't have to close it any more 
...

Re: re-use IndexWriter

2014-07-08 Thread Ian Lea
Read the javadocs to understand the difference between commit() and
flush().  You need commit(), or close().

There are no hard and fast rules and it depends on how much data you
are indexing, how fast, how many searches you're getting and how up to
date they need to be.  And how much you worry about losing indexed
data.

One option is to pick a value that makes sense to you and commit() the
writer every n seconds|minutes|hours|docs.  close() it when your
indexing job exits.  You'll need to reopen index searchers to pick up
changes.  See the javadocs for IndexSearcher.

Another option is to use lucene's near-real-time (NRT) features.  Also
see the IndexSearcher javadocs for a way in to that.


--
Ian.


On Tue, Jul 8, 2014 at 10:08 AM, Jason.H <469673...@qq.com> wrote:
> nowadays , i've been trying every way to improve the performance of indexing 
> , IndexWriter's close operation is really costly , and the Lucene's doc 
> sugguest to re-use IndexWriter instance , i  did it , i  kept the indexWriter 
> instance , and give it back to every request thread , But there comes a big 
> problem ,  i never search the index changes because the index changes is till 
> in the RAM , maybe there's a way to flush all the changes to the stable 
> Storage and this operation don't close the IndexWriter so i could re-use it  
> . am i right at this point ?
>
> there're several point i don't quite understand ..
>
> 1, what's the difference between commit and flush  ?   i thought with these 
> two method , i could see the changes in my Directory without closing 
> IndexWriter .
>
> 2, when should i close the writer ? if i use it Singleton(i don't have to 
> worry about the LockObtainException) , and i don't have to worry about the 
> changes because commit and flush would do this , then i don't have to close 
> it any more ...

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DrillSideways accepting FacetCollector parameter

2014-07-08 Thread Michael McCandless
We could do this, but what's the use case?

E.g. DrillSideways also "hardwires" the drill-sideways collectors it
creates ... do you control over those as well?  Maybe we could make
methods in the DrillSideways class that you could override?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jul 8, 2014 at 7:14 AM, Jigar Shah  wrote:
> Currently Drillsideways provides following method:
>
> public DrillSidewaysResult search(DrillDownQuery query, Collector
> hitCollector);
>
> Can same class provide following method ?
>
> public DrillSidewaysResult search(DrillDownQuery query, Collector
> hitCollector, FacetsCollector facetCollector);
>
> Currently,
>
>  FacetsCollector drillDownCollector = new FacetsCollector();
>
> is created from API method
>
> public DrillSidewaysResult search(DrillDownQuery query, Collector
> hitCollector) throws IOException
>
>  which can be parametrised ?
>
> It will help application to use same FacetsCollector to fetch other facets,
> i.e. non sideways facets.
>
> Thanks,
> Jigar Shah.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Incremental Field Updates

2014-07-08 Thread Ravikumar Govindarajan
That's a cool patch. Thanks


On Thursday, July 3, 2014, Gopal Patwa  wrote:

> Thanks Ravi, it is good to know general problem with updatable field. In
> our use-case where we have few fields which update more frequently then
> main index. We are using this SOLR join contrib patch with DocTransformer
> for returning data from join core. But this approach has some performance
> impact if that performance hit acceptable for your use use-case then you
> can give a try if you are using SOLR.
>
> https://issues.apache.org/jira/browse/SOLR-4787
>
>
>
>
>
> On Thu, Jul 3, 2014 at 3:22 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com > wrote:
>
> > In case of sorting, updatable DocValues may be what you are looking for.
> >
> > But updatable fields for searching is a different beast.
> >
> > A sample approach is documented at
> >
> >
> http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/
> >
> > The general problems with updatable postings-list AFAIK are
> >
> > 1. Impossible to correctly score updated documents
> > 2. Segment Merges could miss out updates
> > 3. Might behave in-correctly with NRT
> > 4. Freq updates could end-up creating lots of files because of
> append-only
> > nature of lucene...
> >
> > May be if you are not too worried about scoring, correct NRT behavior etc
> > you can attempt a solution like the RedisCodec stuff...
> >
> > Segregating static & dynamic fields into 2 separate indexes as described
> > here
> >
> >
> http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management
> > may be of some use to you
> >
> > --
> > Ravi
> >
> >
> >
> > On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera  > wrote:
> >
> > > Using BinaryDocValues is not recommended for all scenarios. It is a
> > > "catchall" alternative to the other DocValues types. I would not use it
> > > unless it makes sense for your application, even if it means that you
> > need
> > > to re-index a document in order to update a single field.
> > >
> > > DocValues are not good for "search" - by search I assume you mean take
> a
> > > query such as "apache AND lucene" and find all documents which contain
> > both
> > > terms under the same field. They are good for sorting and faceting
> > though.
> > >
> > > So I guess the answer to your question is "it depends" (it always is!)
> -
> > I
> > > would use DocValues for sorting and faceting, but not for regular
> search
> > > queries. And I would use BinaryDocValues only when the other DocValues
> > > types don't match.
> > >
> > > Also, note that the current field-level update of DocValues is not
> always
> > > better than re-indexing the document, you can read here for more
> details:
> > >
> >
> http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html
> > >
> > > Shai
> > >
> > >
> > > On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode <
> > > sandeep_khanz...@yahoo.com.invalid> wrote:
> > >
> > > > Hi Shai,
> > > >
> > > > So one follow-up question.
> > > >
> > > > Assume that my use case is to have approx. ~50M documents indexed
> with
> > > > each document having about ~10-15 indexed but not stored fields.
> These
> > > > fields will never change, but there are another ~5-6 fields that will
> > > > change and will continue to change after the index is written. These
> > ~5-6
> > > > fields may also be multivalued. The size of this index turns out to
> be
> > > > ~120GB.
> > > >
> > > > In this case, I would like to sort or facet or search on these ~5-6
> > > > fields. Which approach do you suggest? Should I use BinaryDocValues
> and
> > > > update using IW or use either a ParallelReader/Join query.
> > > >
> > > > ---
> > > > Thanks n Regards,
> > > > Sandeep Ramesh Khanzode
> > > >
> > > >
> > > > On Tuesday, July 1, 2014 9:53 PM, Shai Erera  > wrote:
> > > >
> > > >
> > > >
> > > > Except that Lucene now offers efficient numeric and binary DocValues
> > > > updates. See IndexWriter.updateNumeric/Binary...
> > > >
> > > > On Jul 1, 2014 5:51 PM, "Erick Erickson"  >
> > > wrote:
> > > >
> > > > > This JIRA is "complicated", don't really expect it in 4.9 as it's
> > > > > been hanging around for quite a while. Everyone would like this,
> > > > > but it's not easy.
> > > > >
> > > > > Atomic updates will work, but you have to stored="true" for all
> > > > > source fields. Under the covers this actually reads the document
> > > > > out of the stored fields, deletes the old one and adds it
> > > > > over again.
> > > > >
> > > > > FWIW,
> > > > > Erick
> > > > >
> > > > > On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
> > > > >  wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I wanted to know of the best approach to follow if a few fields
> in
> > my
> > > > > indexed documents are changing at run time (after index and before
> or
> > > > > during search), but a majority of them are created at index time.
> > > > > >
> > > > > > I could see the JIRA given

Adding/removing a term from a document

2014-07-08 Thread Allen Kneser
Hi all,

I am trying to figure out how to easily remove or add a keyword from a
document's index (or equivalently, decrease/increase that keyword's
frequency in the document).

I know Lucene allows you to reindex a document using the
IndexWriter.updateDocument(docPath, doc) call but that's too expensive for
my purposes. I already know the removed & added keywords from a previous
pass through the document and I would like to avoid Lucene doing another
pass.

I am looking for a IndexWriter.adjustTermFreqInDoc("keyword", doc,
deltafreq) which will either change the frequency of "keyword" in 'doc' by
'deltafreq'. This could result in either adding or removing a keyword from
the document in the index.

Is there a way to do this? At first I thought adding term vectors to the
index could help with this but it seems like that will dramatically
increase the index size.

Cheers,
Alin


IndexSearcher.doc thread safe problem

2014-07-08 Thread 김선무
Hi all,

I know IndexSearcher is thread safe.
But IndexSearcher.doc is not thread safe maybe...

I try to below

First, I extract docID at index directory. And that docID add on
queue(ConcurrentLinkedQueue)

Second, extract field value using docID poll at this queue after extract
process end. This process is  work to multi-threads.

For this I used the following summation code below:
searcher.search( query, filter, new Collector() { public void collect( int
doc ) { queue.add( docBase + doc ) } );
Thread thread1 = new Thread( () -> { while( !queue.isEmpty() ) {
System.out.println( searcher.doc(queue.poll()).get("content") ); } } );
Thread thread2 = new Thread( thread1 );
thread1.start();
thread2.start();
---

Result was different in every execution.

My method is wrong? or IndexSearcher bug?

Please help me


Re: DrillSideways accepting FacetCollector parameter

2014-07-08 Thread Jigar Shah
Usecase:

With below code i perform search.

DrillSideways drillSideWays = new DrillSideways(searcher, config,
engine.getTaxoReader());
DrillSidewaysResult result = drillSideWays.search(filterQuery, null, null,
first + limit, sort, true, true);

In above code i don't have reference to FacetCollector fc, which is used.
Consider i want to get LongRangeFacetCounts, which is based on
NumericDocValuesField.

facets = new LongRangeFacetCounts(facetField.getQueryName(), fc,
longRanges.toArray(new LongRange[longRanges
.size()]));

if i use below, i get access to current fc.

FacetsCollector fc = new FacetsCollector();
TopDocs topDocs = FacetsCollector.search(searcher, query, null, first +
limit, sort, true, true, fc);

Difference is if i use ' FacetsCollector.search(searcher, query, null,
first + limit, sort, true, true, fc);' i can get FacetCollector. This is
not true in case of DrillSideways.

Let me know if, there is already some other way provided.

Thanks,
Jigar Shah.






On Tue, Jul 8, 2014 at 8:15 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> We could do this, but what's the use case?
>
> E.g. DrillSideways also "hardwires" the drill-sideways collectors it
> creates ... do you control over those as well?  Maybe we could make
> methods in the DrillSideways class that you could override?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Jul 8, 2014 at 7:14 AM, Jigar Shah  wrote:
> > Currently Drillsideways provides following method:
> >
> > public DrillSidewaysResult search(DrillDownQuery query, Collector
> > hitCollector);
> >
> > Can same class provide following method ?
> >
> > public DrillSidewaysResult search(DrillDownQuery query, Collector
> > hitCollector, FacetsCollector facetCollector);
> >
> > Currently,
> >
> >  FacetsCollector drillDownCollector = new FacetsCollector();
> >
> > is created from API method
> >
> > public DrillSidewaysResult search(DrillDownQuery query, Collector
> > hitCollector) throws IOException
> >
> >  which can be parametrised ?
> >
> > It will help application to use same FacetsCollector to fetch other
> facets,
> > i.e. non sideways facets.
> >
> > Thanks,
> > Jigar Shah.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>