Re: Non-index files under the search directory

2016-11-22 Thread Xiaolong Zheng
Hi András,

Thanks, this is what I need!

 I also notice this user commit data does not carry over if I am
consolidating several search database into a new one, I guess the solution
should be explicitly use getCommitData for each sub-index, then set it into
new consolidated search database, right?

Best,

--Xiaolong


On Tue, Nov 22, 2016 at 12:10 PM, András Péteri  wrote:

> Hi Xiaolong,
>
> A Map of key-value pairs can be supplied to
> IndexWriter#setCommitData(Map) and will be persisted
> when committing changes (setting the commit data counts as a change).
> It can be retrieved with IndexWriter#getCommitData() later.
>
> This may serve as good storage for metadata; as an example,
> Elasticsearch stores attributes related to its transaction log there
> (UUID and generation identifier).
>
> Regards,
> András
>
> On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng 
> wrote:
> > Thanks, StoredField seems still down to the per-document level, which
> means
> > for every document they will contains this search field.
> >
> > What I really would like is a global level storage to hold this single
> > value. Maybe this is impossible.
> >
> > Sincerely,
> >
> > --Xiaolong
> >
> >
> > On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Lucene won't merge foreign files for you, and in general it's
> >> dangerous to put such files into Lucene's index directory because if
> >> they look like codec files Lucene may delete them.
> >>
> >> Can you just add a StoredField to each document to hold your
> information?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
> >>  wrote:
> >> > Hello,
> >> >
> >> > I am trying to adding some meta data into the search data base.
> Instead
> >> of
> >> > adding a new search filed or adding a phony document, I am looking at
> the
> >> > method org.apache.lucene.store.Directory#createOutpu, which is create
> >> new
> >> > file in the search directory.
> >> >
> >> >
> >> > I am wondering does indexwriter can also merge this non-index file
> while
> >> it
> >> > merging multiple search index?
> >> >
> >> > And if I am stepping back a little bit, what's is the best way to add
> >> meta
> >> > data into the search database.
> >> >
> >> > For example, I would like to add a indicator which is showing the
> >> different
> >> > kind of stemmer is being used while it created.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > --Xiaolong
> >>
>
> --
> András Péteri
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Non-index files under the search directory

2016-11-22 Thread amarnath cse
Can anyone tell me the procedure of text document indexing using Lucene.

Thanks..
On Nov 22, 2016 10:40 PM, "András Péteri" 
wrote:

> Hi Xiaolong,
>
> A Map of key-value pairs can be supplied to
> IndexWriter#setCommitData(Map) and will be persisted
> when committing changes (setting the commit data counts as a change).
> It can be retrieved with IndexWriter#getCommitData() later.
>
> This may serve as good storage for metadata; as an example,
> Elasticsearch stores attributes related to its transaction log there
> (UUID and generation identifier).
>
> Regards,
> András
>
> On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng 
> wrote:
> > Thanks, StoredField seems still down to the per-document level, which
> means
> > for every document they will contains this search field.
> >
> > What I really would like is a global level storage to hold this single
> > value. Maybe this is impossible.
> >
> > Sincerely,
> >
> > --Xiaolong
> >
> >
> > On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Lucene won't merge foreign files for you, and in general it's
> >> dangerous to put such files into Lucene's index directory because if
> >> they look like codec files Lucene may delete them.
> >>
> >> Can you just add a StoredField to each document to hold your
> information?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
> >>  wrote:
> >> > Hello,
> >> >
> >> > I am trying to adding some meta data into the search data base.
> Instead
> >> of
> >> > adding a new search filed or adding a phony document, I am looking at
> the
> >> > method org.apache.lucene.store.Directory#createOutpu, which is create
> >> new
> >> > file in the search directory.
> >> >
> >> >
> >> > I am wondering does indexwriter can also merge this non-index file
> while
> >> it
> >> > merging multiple search index?
> >> >
> >> > And if I am stepping back a little bit, what's is the best way to add
> >> meta
> >> > data into the search database.
> >> >
> >> > For example, I would like to add a indicator which is showing the
> >> different
> >> > kind of stemmer is being used while it created.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > --Xiaolong
> >>
>
> --
> András Péteri
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Non-index files under the search directory

2016-11-22 Thread András Péteri
Hi Xiaolong,

A Map of key-value pairs can be supplied to
IndexWriter#setCommitData(Map) and will be persisted
when committing changes (setting the commit data counts as a change).
It can be retrieved with IndexWriter#getCommitData() later.

This may serve as good storage for metadata; as an example,
Elasticsearch stores attributes related to its transaction log there
(UUID and generation identifier).

Regards,
András

On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng  wrote:
> Thanks, StoredField seems still down to the per-document level, which means
> for every document they will contains this search field.
>
> What I really would like is a global level storage to hold this single
> value. Maybe this is impossible.
>
> Sincerely,
>
> --Xiaolong
>
>
> On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Lucene won't merge foreign files for you, and in general it's
>> dangerous to put such files into Lucene's index directory because if
>> they look like codec files Lucene may delete them.
>>
>> Can you just add a StoredField to each document to hold your information?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
>>  wrote:
>> > Hello,
>> >
>> > I am trying to adding some meta data into the search data base. Instead
>> of
>> > adding a new search filed or adding a phony document, I am looking at the
>> > method org.apache.lucene.store.Directory#createOutpu, which is create
>> new
>> > file in the search directory.
>> >
>> >
>> > I am wondering does indexwriter can also merge this non-index file while
>> it
>> > merging multiple search index?
>> >
>> > And if I am stepping back a little bit, what's is the best way to add
>> meta
>> > data into the search database.
>> >
>> > For example, I would like to add a indicator which is showing the
>> different
>> > kind of stemmer is being used while it created.
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > --Xiaolong
>>

-- 
András Péteri

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Non-index files under the search directory

2016-11-22 Thread Xiaolong Zheng
Thanks, StoredField seems still down to the per-document level, which means
for every document they will contains this search field.

What I really would like is a global level storage to hold this single
value. Maybe this is impossible.

Sincerely,

--Xiaolong


On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Lucene won't merge foreign files for you, and in general it's
> dangerous to put such files into Lucene's index directory because if
> they look like codec files Lucene may delete them.
>
> Can you just add a StoredField to each document to hold your information?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
>  wrote:
> > Hello,
> >
> > I am trying to adding some meta data into the search data base. Instead
> of
> > adding a new search filed or adding a phony document, I am looking at the
> > method org.apache.lucene.store.Directory#createOutpu, which is create
> new
> > file in the search directory.
> >
> >
> > I am wondering does indexwriter can also merge this non-index file while
> it
> > merging multiple search index?
> >
> > And if I am stepping back a little bit, what's is the best way to add
> meta
> > data into the search database.
> >
> > For example, I would like to add a indicator which is showing the
> different
> > kind of stemmer is being used while it created.
> >
> >
> >
> >
> >
> > Thanks,
> >
> > --Xiaolong
>


Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-22 Thread Chitra R
Kindly post your suggestions.

Regards,
Chitra






























On Sat, Nov 19, 2016 at 1:38 PM, Chitra R  wrote:

> Hey, I got it clearly. Thank you so much. Could you please help us to
> implement it in our use case?
>
>
> In our case, we are having dynamic index and it is variable depth too. So
> flat facet is enough.No need of hierarchical facets.
>
> What I think is,
>
>
>1. Index my facet field as normal doc value field, so that no special
>operation (like taxonomy and sorted set doc values facet field) will be
>done at index time and only doc value field stores its ordinals in their
>respective field.
>2. At search time, I will pass query (user search query) , filter
>(path traversed list)  and collect the matching documents in
>Facetscollector.
>
>3. To compute facet count for the specific field, I will gather those
>resulted docs, then move through each segment for collecting the matching
>ordinals using AtomicReader.
>
>
> And know when I use this means, can't calculate facet count for more than
> one field(facet) in a search.
>
> Instead of loading all the dimensions in DocValuesReaderState (will take
> more time and memory) at search time, loading specific fields will take
> less time and memory, hope so. Kindly help to solve.
>
>
> It will do it in a minimal index and search cost, I think. And hope this
> won't put overload at index time, also at search time this will be better.
>
>
> Kindly post your suggestions.
>
>
> Regards,
> Chitra
>
>
>
>
> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> I think you've summed up exactly the differences!
>>
>> And, yes, it would be possible to emulate hierarchical facets on top
>> of flat facets, if the hierarchy is fixed depth like year/month/day.
>>
>> But if it's variable depth, it's trickier (but I think still
>> possible).  See e.g. the Committed Paths drill-down on the left, on
>> our dog-food server
>> http://jirasearch.mikemccandless.com/search.py?index=jira
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R  wrote:
>> > case 1:
>> > In taxonomy, for each indexed document, examines facet label ,
>> > computes their ordinals and mappings, and which will be stored in
>> sidecar
>> > index at index time.
>> >
>> > case 2:
>> > In doc values, these(ordinals) are computed at search time, so
>> there
>> > will be a time and memory trade-off between both cases, hope so.
>> >
>> >
>> > In taxonomy, building hierarchical facets at index time makes faceting
>> cost
>> > minimal at search time than flat facets in doc values.
>> >
>> > Except (memory,time and NRT latency) , Is any another contrast between
>> > hierarchical and flat facets at search time?
>> >
>> >
>> > Kindly post your suggestions...
>> >
>> >
>> > Regards,
>> > Chitra
>> >
>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R 
>> wrote:
>> >>
>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>> >> facets during indexing. Hope hierarchical in the sense, we might index
>> the
>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
>> maintained
>> >> in sidecar index and it is mapped to the main index.
>> >>
>> >> For example:
>> >>
>> >> In search-lucene.com , I enter a term (say facet), top
>> >> documents and their categories are displayed after performing the
>> search.
>> >> Say I drill down through Publish date/2010 to collect its child counts
>> and
>> >> after I will pass through publishdate/2010/10 to collect their child
>> counts.
>> >> And for each drill down, each search will be performed to collect its
>> top
>> >> docs and categories.
>> >>
>> >>
>> >>Even I can achieve this in flat facets by changing the
>> >> drill down query.
>> >>
>> >> Am I right or missed anything? yet I don't know if I missed anything...
>> >>
>> >> So What is the need of hierarchical facets? Could you please explain
>> >> it(hierarchical facets) in the real-world use case?
>> >>
>> >>
>> >> Regards,
>> >> Chitra
>> >>
>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
>> >>  wrote:
>> >>>
>> >>> You store dimension + string (a single value path, since it's not
>> >>> hierarchical) into SSDVFF so that you can compute facet counts, either
>> >>> ordinary drill down counts or the drill sideways counts.
>> >>>
>> >>> You can see examples of drill sideways at
>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
>> >>> fields on the left and you don't lose the previous facet counts for
>> >>> that field.
>> >>>
>> >>> Mike McCandless
>> >>>
>> >>> http://blog.mikemccandless.com
>> >>>
>> >>>
>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R 
>> wrote:
>> 

Re: Non-index files under the search directory

2016-11-22 Thread Michael McCandless
Lucene won't merge foreign files for you, and in general it's
dangerous to put such files into Lucene's index directory because if
they look like codec files Lucene may delete them.

Can you just add a StoredField to each document to hold your information?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
 wrote:
> Hello,
>
> I am trying to adding some meta data into the search data base. Instead of
> adding a new search filed or adding a phony document, I am looking at the
> method org.apache.lucene.store.Directory#createOutpu, which is create new
> file in the search directory.
>
>
> I am wondering does indexwriter can also merge this non-index file while it
> merging multiple search index?
>
> And if I am stepping back a little bit, what's is the best way to add meta
> data into the search database.
>
> For example, I would like to add a indicator which is showing the different
> kind of stemmer is being used while it created.
>
>
>
>
>
> Thanks,
>
> --Xiaolong

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: 'OR' search using Automaton query

2016-11-22 Thread Michael McCandless
Yes, just use Operations.union to merge each of your OR'd automata
into a single one, and then search on that automaton.

But remember that leading wildcard searches are tremendously costly:
they require a full scan of all terms in the index.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Nov 22, 2016 at 12:03 AM, hariram ravichandran
 wrote:
> I use automaton query to combine fuzzy and wildcard query.(for example a
> query on "*lucy*" should also return "*lucene*"). That's working great.
>
> Now, if I search for "*lucy Automaton query*", I want all the documents
> containing *lucene* or *Automaton* or *query* or *lucene Automaton* or *lucene
> query* or *automaton query* or *lucene Automaton query*.
>
> Is *OR *search is possible using automaton?

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org