How does sorting work in Lucene?

2016-02-28 Thread Gimantha Bandara
Hi all,

We are using lucene to index our data and are maintaining millions of
documents in sharded indices. Currently what we do is, reading each shard
separately and collecting the TopDocs using TopDocCollector then sort them
by the score and returning the Top scored Documents. I think using the
MultiReader can replace this logic.

But I have some questions regarding sorting by a specific field/fields.

1. Does lucene sort at the search time or does it store sort information at
the index time in some way?

2. How would I implement pagination for a sorted set of documents? I have
several shards and each shard may contain millions of records.getting the
first few pages, each page having 100 documents or so may be fine. But lets
say I want to get the 1000th page. I have to sort the whole document sets
of all the shards and get the 1000th page once all the documents are
sorted. Does Lucene support pagination?

Help is much appreciated.
-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: How does sorting work in Lucene?

2016-02-29 Thread Gimantha Bandara
Any thoughts?

On Monday, February 29, 2016, Gimantha Bandara  wrote:

> Hi all,
>
> We are using lucene to index our data and are maintaining millions of
> documents in sharded indices. Currently what we do is, reading each shard
> separately and collecting the TopDocs using TopDocCollector then sort them
> by the score and returning the Top scored Documents. I think using the
> MultiReader can replace this logic.
>
> But I have some questions regarding sorting by a specific field/fields.
>
> 1. Does lucene sort at the search time or does it store sort information
> at the index time in some way?
>
> 2. How would I implement pagination for a sorted set of documents? I have
> several shards and each shard may contain millions of records.getting the
> first few pages, each page having 100 documents or so may be fine. But lets
> say I want to get the 1000th page. I have to sort the whole document sets
> of all the shards and get the 1000th page once all the documents are
> sorted. Does Lucene support pagination?
>
> Help is much appreciated.
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Getting an Exception while searching when (numHits = Large Number) in TopScoreDocCollector

2016-03-01 Thread Gimantha Bandara
I know that I am getting this exception because the priorityQueue allocate
memory more than my PC can allocate from the RAM.

ERROR
{org.wso2.carbon.analytics.dataservice.core.indexing.AnalyticsDataIndexer}
-  Error in index search: null
java.lang.NegativeArraySizeException
at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:58)
at org.apache.lucene.search.HitQueue.(HitQueue.java:64)
at
org.apache.lucene.search.TopScoreDocCollector.(TopScoreDocCollector.java:184)
at
org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector.(TopScoreDocCollector.java:53)
at
org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:174)
at
org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:154)

Is there a way to get the matching documents in a streaming fashion?

Thanks,
Gimantha


Re: How does sorting work in Lucene?

2016-03-12 Thread Gimantha Bandara
We are using our own clustering mechanism using Hazelcast. Sorting works
fine when the our server runs in standalone mode. Lucene returns the  doc
ids in the sorted order but the score is always 1.0. Is this expected? or
am I doing something wrong? ( Please note that the doc id order is returned
correctly) since the score is always 1.0, we dont have to a way to sort
results from several nodes when we cluster our servers. Even If the score
is 1.0, I have a doubt if the score is relative to the index on which the
search is performed. (So the score of a document from a specific index
cannot be compared to another document's score of a different index). If we
assume that the score is properly returned, can I use the scores of the
docs of different indices and sort all the doc ids from all the indexes(
probably using mergesort)?

On Tue, Mar 1, 2016 at 11:04 AM, Gimantha Bandara  wrote:

> Any thoughts?
>
>
> On Monday, February 29, 2016, Gimantha Bandara  wrote:
>
>> Hi all,
>>
>> We are using lucene to index our data and are maintaining millions of
>> documents in sharded indices. Currently what we do is, reading each shard
>> separately and collecting the TopDocs using TopDocCollector then sort them
>> by the score and returning the Top scored Documents. I think using the
>> MultiReader can replace this logic.
>>
>> But I have some questions regarding sorting by a specific field/fields.
>>
>> 1. Does lucene sort at the search time or does it store sort information
>> at the index time in some way?
>>
>> 2. How would I implement pagination for a sorted set of documents? I have
>> several shards and each shard may contain millions of records.getting the
>> first few pages, each page having 100 documents or so may be fine. But lets
>> say I want to get the 1000th page. I have to sort the whole document sets
>> of all the shards and get the 1000th page once all the documents are
>> sorted. Does Lucene support pagination?
>>
>> Help is much appreciated.
>> --
>> Gimantha Bandara
>> Software Engineer
>> WSO2. Inc : http://wso2.com
>> Mobile : +94714961919
>>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


How to get the top facets values of a field/dimension which has the highest number of immediate children

2016-03-15 Thread Gimantha Bandara
Hi,

Lets say I have thousands of Lucene documents which have a FacetField which
has the format of the following.

doc.add(new FacetField("Category", "level0", "level1", "level2", "level3",
...));

"Category" is the dimension name. "level0" represents the first
hierarchical level, "level1" represents the second hierarchical level of
"level0" (or the immediate children of "level0") and so on.. How do I get
the top "level0" values of "Category" which will have highest number of
unique "level1" values along with the unique "level1" count?

Thanks,
Gimantha


Re: GROUP BY in Lucene

2016-03-18 Thread Gimantha Bandara
Hi Rob,

Thanks a lot for above very descriptive answer. I will give it a try.

On Friday, March 18, 2016, Rob Audenaerde  wrote:

> Hi Gimantha,
>
> You don't need to store the aggregates and don't need to retrieve
> Documents. The aggregates are calculated during collection using the
> BinaryDocValues from the facet-module. What I do, is that I need to store
> values in the facets using AssociationFacetFields. (for example
> FloatAssociationFacetField). I just choose facets because then I can use
> the facets as well :)
>
> I have a implementation of `Facets` class that does all the aggregation. I
> cannot paste all the code unfortunately, but here is the idea (it is loosly
> based on the TaxonomyFacetSumIntAssociations implementation, where you can
> look up how the BinaryDocValues are translated to ordinals and to facets).
> This aggregation is used in conjunction with a FacetsCollector, which
> collects the facets during a search:
>
> FacetsCollector fc = new FacetsCollector();
> searcher.search(new ConstantScoreQuery(query), fc);
>
>
> Then, the use this FacetsCollector:
>
>  taxoReader = getTaxonomyReaderManager().acquire();
>  OnePassTaxonomyFacets facets = new OnePassTaxonomyFacets(taxoReader,
> LuceneIndexConfig.facetConfig);
>  Collection
> facets.aggregateValues(fc.getMatchingDocs(), p.getGroupByListWithoutData(),
> aggregateFields);
>
>
> The aggregateValues method (cannot paste it all :(  ) :
>
>
> public final Collection
> aggregateValues(List matchingDocs, final List
> groupByFields,
> final List aggregateFieldNames, EmptyValues
> emptyValues) throws IOException {
> LOG.info("Starting aggregation for pivot.. EmptyValues=" +
> emptyValues);
>
> //We want to group a list of ordinals to a list of aggregates. The
> taxoReader has the ordinals, so a selection like 'Lang=NL, Region=South'
> will
> //end up like a MultiIntKey of [13,44]
> Map> aggs = Maps.newHashMap();
>
> List groupByFieldsNames = Lists.newArrayList();
> for (GroupByField gbf : groupByFields) {
> groupByFieldsNames.add(gbf.getField().getName());
> }
> int groupByCount = groupByFieldsNames.size();
>
> //We need to know which ordinals are the 'group-by' ordinals, so
> we can check if a ordinal that is found, belongs to one of these fields
> int[] groupByOrdinals = new int[groupByCount];
> for (int i = 0; i < groupByOrdinals.length; i++) {
> groupByOrdinals[i] =
> this.getOrdinalForListItem(groupByFieldsNames, i);
> }
>
> //We need to know with ordinals are the 'aggregate-field'
> ordinals, so we can check if a ordinal that is found, belongs to one of
> these fields
> int[] aggregateOrdinals = new int[aggregateFieldNames.size()];
> for (int i = 0; i < aggregateOrdinals.length; i++) {
> aggregateOrdinals[i] =
> this.getOrdinalForListItem(aggregateFieldNames, i);
> }
>
> //Now we go and find all the ordinals in the matching documents.
> //For each ordinal, we check if it is a groupBy-ordinal, or a
> aggregate-ordinal, and act accordinly.
> for (MatchingDocs hitList : matchingDocs) {
> BinaryDocValues dv =
> hitList.context.reader().getBinaryDocValues(this.indexFieldName);
>
> //Here find the oridinals of the group-by-fields and the
> arrgegate fields.
> //Create a multi ordinal key MultiIntKey from the
> group-by-ordinals and use that to add the current value of the fiels to do
> the agggregation to the facet-aggregates
>
> ..
>
>
> Hope this helps :)
> -Rob
>
>

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: GROUP BY in Lucene

2016-03-19 Thread Gimantha Bandara
Hi Rob,

Thank you for explaining your approach. Still I have a few questions. Do I
need to store the values being aggregated as STORED at indexing time? and
how does the collector handle a large number of documents when aggregating?
I mean lets say I have several millions documents in an index and I am
going to call the SUM of a field called "subject_marks". How does the
collector efficiently handle summation? Is it going through all the
segments parallelly or something like that?

For now we have a facetfield which has X Y Z and I can get documents which
belong to a specific XYZ group and perform aggregation over those records.
So I can actually do that for all the groups. But it is not fast. It is
like a simple Java loop which go through all the different facet values and
aggregate the documents values belong to those facet values and put them
into a map. It is slow because we are not storing field values in the
Lucene documents, we fetch the actual data from a DB. We only keep an ID as
a STORED field in lucene documents once we get those IDS from lucene
documents we look up the DB and perform aggregation. This is really slow
when the number of records grow.

Thanks,
Gimantha

On Mon, Aug 10, 2015 at 6:26 PM, Rob Audenaerde 
wrote:

> You can write a custom (facet) collector to do this. I have done something
> similar, I'll describe my approach:
>
> For all the values that need grouping or aggregating, I have added a
> FacetField ( an AssociatedFacetField, so I can store the value alongside
> the ordinal) . The main search stays the same, in your case for example a
> NumericRangeQuery  (if the date is store in ms).
>
> Then I have a custom facet collector that does the grouping.
>
> Basically, it goes through all the MatchingDocs. For each doc, it creates a
> unique key (composed of X, Y and Z), and makes aggregates as needed (sum
> D).These are stored in a map. If a key is already in the map, the existing
> aggregate is added to the new value. Tricky is to make your unique key fast
> and immutable, so you can  precompute the hashcode.
>
> This is fast enough if the number of unique keys is smallish (<10.000),
> index size +- 1M docs).
>
> -Rob
>
>
> On Mon, Aug 10, 2015 at 2:47 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
> > Lucene has a grouping module that has several approaches for grouping
> > search hits, though it's only by a single field I believe.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Sun, Aug 9, 2015 at 2:55 PM, Gimantha Bandara 
> > wrote:
> > > Hi all,
> > >
> > > Is there a way to achieve $subject? For example, consider the following
> > SQL
> > > query.
> > >
> > > SELECT A, B, C SUM(D) as E FROM  `table` WHERE time BETWEEN fromDate
> AND
> > > toDate *GROUP BY X,Y,Z*
> > >
> > > In the above query we can group the records by, X,Y,Z. Is there a way
> to
> > > achieve the same in Lucene? (I guess Faceting would help, But is it
> > > possible get all the categoryPaths along with the matching records? )
> Is
> > > there any other way other than using Facets?
> > >
> > > --
> > > Gimantha Bandara
> > > Software Engineer
> > > WSO2. Inc : http://wso2.com
> > > Mobile : +94714961919
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Lucene 5.0.0 - StringField and Sorting

2016-04-21 Thread Gimantha Bandara
Hi Torsten,

Did you find a solution for this? I am having the same issue.. I am
planning to create a custom Field with DocValueType.SORTED. Is there any
other way to do that without creating a custom Field?

On Fri, Mar 6, 2015 at 3:34 PM, Torsten Krah  wrote:

> Hi,
>
> looking at the JavaDoc of StringField it says:
>
> /** A field that is indexed but not tokenized: the entire
>  *  String value is indexed as a single token.  For example
>  *  this might be used for a 'country' field or an 'id'
>  *  field, or any field that you intend to use for sorting
>  *  or access through the field cache. */
>
> So i intend to use some StringFields for sorting.
> However trying to sort on them fails with:
>
> java.lang.IllegalStateException: unexpected docvalues type NONE for
> field 'NAME_KEYWORD' (expected=SORTED).
>
> Was indexed as StringField and Store.YES.
>
> So is the JavaDoc wrong here or is it correct and StringField should
> set:
>
> TYPE.setDocValuesType(DocValuesType.SORTED);
>
> so it would work?
>
> kind regards
>
> Torsten
>
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Lucene 5.0.0 - StringField and Sorting

2016-04-25 Thread Gimantha Bandara
Yep.. adding a SortedDocValuesField did work for me! thanks..

On Mon, Apr 25, 2016 at 8:39 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> The Lucene level javadocs are definitely stale ... I'll fix.
>
> You should separately add a SortedDocValuesField if you also need to sort
> on this field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Mar 6, 2015 at 5:04 AM, Torsten Krah  wrote:
>
> > Hi,
> >
> > looking at the JavaDoc of StringField it says:
> >
> > /** A field that is indexed but not tokenized: the entire
> >  *  String value is indexed as a single token.  For example
> >  *  this might be used for a 'country' field or an 'id'
> >  *  field, or any field that you intend to use for sorting
> >  *  or access through the field cache. */
> >
> > So i intend to use some StringFields for sorting.
> > However trying to sort on them fails with:
> >
> > java.lang.IllegalStateException: unexpected docvalues type NONE for
> > field 'NAME_KEYWORD' (expected=SORTED).
> >
> > Was indexed as StringField and Store.YES.
> >
> > So is the JavaDoc wrong here or is it correct and StringField should
> > set:
> >
> > TYPE.setDocValuesType(DocValuesType.SORTED);
> >
> > so it would work?
> >
> > kind regards
> >
> > Torsten
> >
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


setRAMBufferSizeMB and setRAMPerThreadHardLimitMB

2016-07-28 Thread Gimantha Bandara
Hi all,

Can someone explain what these methods do? Why do we have two different
methods for per threads and for all the documents? default value for the
RAMBufferSize is 16 mbs and PerThread value is 1945 MB. What will happen if
I set BufferSize to 2048? will the docs be flushed to directory when each
thread reaches 1945 MB?

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Difference between CategoryPath and Plain FacetFields with hierarchy

2015-03-04 Thread Gimantha Bandara
Hi,

I am new to Lucene faceting and taxonomy. I saw few examples in some blogs
and in facets guide. Some have used CategoryPath with TaxonomyWriters,
TaxonomyReaders and FacetSearchParams. Some have used FacetFields without
using taxonomyWriters and TaxonomyReaders. What is the difference between
both these approaches? What is the recommended approach to to create a
faceted search application? For so called two different approaches please
refer to [1] and [2]

[1] http://www.hascode.com/2012/08/lucene-snippets-faceting-search/
[2] http://www.norconex.com/facets-with-lucene/

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Difference between CategoryPath and Plain FacetFields with hierarchy

2015-03-04 Thread Gimantha Bandara
Hi,

Any help on this? Or Can someone point me to Faceted User guide of 4.10.3.
I cannot find it. Is it only available for Older version?

On Wed, Mar 4, 2015 at 2:38 PM, Gimantha Bandara  wrote:

> Hi,
>
> I am new to Lucene faceting and taxonomy. I saw few examples in some blogs
> and in facets guide. Some have used CategoryPath with TaxonomyWriters,
> TaxonomyReaders and FacetSearchParams. Some have used FacetFields without
> using taxonomyWriters and TaxonomyReaders. What is the difference between
> both these approaches? What is the recommended approach to to create a
> faceted search application? For so called two different approaches please
> refer to [1] and [2]
>
> [1] http://www.hascode.com/2012/08/lucene-snippets-faceting-search/
> [2] http://www.norconex.com/facets-with-lucene/
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Difference between CategoryPath and Plain FacetFields with hierarchy

2015-03-05 Thread Gimantha Bandara
Thanks Michael,

I am going to use FacetFields/FacetsConfig. So using FacetFields we can
define a hierarchy like below.

doc.add(new FacetField("Publish Date", "2012", "1", "7"))


Is it possible to use FacetFields like this, as we could do with
CategoryPath?

doc.add(new FacetField("Publish Date", "2012/1/7"))




On Thu, Mar 5, 2015 at 8:17 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> The facets API changed in 5.0, from CategoryPath/FacetSearchParams to
> FacetFields/FacetsConfig.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Mar 5, 2015 at 2:13 AM, Gimantha Bandara 
> wrote:
> > Hi,
> >
> > Any help on this? Or Can someone point me to Faceted User guide of
> 4.10.3.
> > I cannot find it. Is it only available for Older version?
> >
> > On Wed, Mar 4, 2015 at 2:38 PM, Gimantha Bandara 
> wrote:
> >
> >> Hi,
> >>
> >> I am new to Lucene faceting and taxonomy. I saw few examples in some
> blogs
> >> and in facets guide. Some have used CategoryPath with TaxonomyWriters,
> >> TaxonomyReaders and FacetSearchParams. Some have used FacetFields
> without
> >> using taxonomyWriters and TaxonomyReaders. What is the difference
> between
> >> both these approaches? What is the recommended approach to to create a
> >> faceted search application? For so called two different approaches
> please
> >> refer to [1] and [2]
> >>
> >> [1] http://www.hascode.com/2012/08/lucene-snippets-faceting-search/
> >> [2] http://www.norconex.com/facets-with-lucene/
> >>
> >> --
> >> Gimantha Bandara
> >> Software Engineer
> >> WSO2. Inc : http://wso2.com
> >> Mobile : +94714961919
> >>
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Sampled Hit counts using Lucene Facets.

2015-03-06 Thread Gimantha Bandara
Hi,

I am trying to create some APIs using lucene facets APIs. First I will
explain my requirement with an example. Lets say I am keeping track of the
count of  people who enter through a certain door. Lets say the time range
I am interested in Last 6 hours( to get the total count, I know that I ll
have to use Ranged Facets). How do I sample this time range and get the
counts of each sample? In other words, as an example, If I split the last
6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time ranges. I
would be interested in getting hit counts for each of these 72 ranges in an
array with the respective lower bound of each sample. Can someone point me
the direction I should follow/ the classes which can be helpful looking at?
ElasticSearch already has this feature exposed by their Javascript API.

Is it possible to implement the same with lucene?
Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?

Thanks,

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Sampled Hit counts using Lucene Facets.

2015-03-07 Thread Gimantha Bandara
Any help on this please?

On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara  wrote:

> Hi,
>
> I am trying to create some APIs using lucene facets APIs. First I will
> explain my requirement with an example. Lets say I am keeping track of the
> count of  people who enter through a certain door. Lets say the time range
> I am interested in Last 6 hours( to get the total count, I know that I ll
> have to use Ranged Facets). How do I sample this time range and get the
> counts of each sample? In other words, as an example, If I split the last
> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time ranges. I
> would be interested in getting hit counts for each of these 72 ranges in an
> array with the respective lower bound of each sample. Can someone point me
> the direction I should follow/ the classes which can be helpful looking at?
> ElasticSearch already has this feature exposed by their Javascript API.
>
> Is it possible to implement the same with lucene?
> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
>
> Thanks,
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Sampled Hit counts using Lucene Facets.

2015-03-09 Thread Gimantha Bandara
Any updates on this please? Do I have to write my own code to sample and
get the hitcount?

On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara  wrote:

> Any help on this please?
>
> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara 
> wrote:
>
>> Hi,
>>
>> I am trying to create some APIs using lucene facets APIs. First I will
>> explain my requirement with an example. Lets say I am keeping track of the
>> count of  people who enter through a certain door. Lets say the time range
>> I am interested in Last 6 hours( to get the total count, I know that I ll
>> have to use Ranged Facets). How do I sample this time range and get the
>> counts of each sample? In other words, as an example, If I split the last
>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time ranges. I
>> would be interested in getting hit counts for each of these 72 ranges in an
>> array with the respective lower bound of each sample. Can someone point me
>> the direction I should follow/ the classes which can be helpful looking at?
>> ElasticSearch already has this feature exposed by their Javascript API.
>>
>> Is it possible to implement the same with lucene?
>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
>>
>> Thanks,
>>
>> --
>> Gimantha Bandara
>> Software Engineer
>> WSO2. Inc : http://wso2.com
>> Mobile : +94714961919
>>
>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Sampled Hit counts using Lucene Facets.

2015-03-10 Thread Gimantha Bandara
What I am planning to do is, split the given time range into smaller time
ranges  by myself and pass them to a LongRangeFacetsCount object and get
the counts for each sub range. Is this the correct way?

On Tue, Mar 10, 2015 at 12:01 AM, Gimantha Bandara 
wrote:

> Any updates on this please? Do I have to write my own code to sample and
> get the hitcount?
>
> On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara 
> wrote:
>
>> Any help on this please?
>>
>> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara 
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to create some APIs using lucene facets APIs. First I will
>>> explain my requirement with an example. Lets say I am keeping track of the
>>> count of  people who enter through a certain door. Lets say the time range
>>> I am interested in Last 6 hours( to get the total count, I know that I ll
>>> have to use Ranged Facets). How do I sample this time range and get the
>>> counts of each sample? In other words, as an example, If I split the last
>>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time ranges. I
>>> would be interested in getting hit counts for each of these 72 ranges in an
>>> array with the respective lower bound of each sample. Can someone point me
>>> the direction I should follow/ the classes which can be helpful looking at?
>>> ElasticSearch already has this feature exposed by their Javascript API.
>>>
>>> Is it possible to implement the same with lucene?
>>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
>>>
>>> Thanks,
>>>
>>> --
>>> Gimantha Bandara
>>> Software Engineer
>>> WSO2. Inc : http://wso2.com
>>> Mobile : +94714961919
>>>
>>
>>
>>
>> --
>> Gimantha Bandara
>> Software Engineer
>> WSO2. Inc : http://wso2.com
>> Mobile : +94714961919
>>
>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Lucene index

2015-03-10 Thread Gimantha Bandara
Please take a look at this[1]
Not sure if it is a graph database

[1] http://lucene.apache.org/core/4_10_3/demo/overview-summary.html

On Tue, Mar 10, 2015 at 2:56 PM, Noora Alalawi 
wrote:

> Hello dears
>
> Please I want your help.
> I need a simple example for add and index synonym in lucene PLEASE
>
>
> Also, is lucene index graph database or not?
>
>
>
> Thank U
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Sampled Hit counts using Lucene Facets.

2015-03-10 Thread Gimantha Bandara
Hi Shai,

Yes, Splitting ranges into smaller ranges is not as same as sampling. I
have used the wrong word there. I think RandomSamplingFacetsCollector is
for "sampling" a larger dataset and that class cannot be used to implement
the described example above. I think I ll have to prepare the Ranges
manually and pass them to LongRangeFacetsCounts.

On Tue, Mar 10, 2015 at 4:54 PM, Shai Erera  wrote:

> I am not sure that splitting the ranges into smaller ranges is the same as
> sampling.
>
> Take a look RandomSamplingFacetsCollector - it implements sampling by
> sampling the document space, not the facet values space.
>
> So if for instance you use a LongRangeFacetCounts in conjunction with a
> RandomSamplingFacetsCollector, you would get the matching documents space
> sampled, and the counts you would get for each range could be considered
> "sampled" too. This is at least how we implemented facet sampling.
>
> Shai
>
> On Tue, Mar 10, 2015 at 10:21 AM, Gimantha Bandara 
> wrote:
>
> > What I am planning to do is, split the given time range into smaller time
> > ranges  by myself and pass them to a LongRangeFacetsCount object and get
> > the counts for each sub range. Is this the correct way?
> >
> > On Tue, Mar 10, 2015 at 12:01 AM, Gimantha Bandara 
> > wrote:
> >
> > > Any updates on this please? Do I have to write my own code to sample
> and
> > > get the hitcount?
> > >
> > > On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara 
> > > wrote:
> > >
> > >> Any help on this please?
> > >>
> > >> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara 
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I am trying to create some APIs using lucene facets APIs. First I
> will
> > >>> explain my requirement with an example. Lets say I am keeping track
> of
> > the
> > >>> count of  people who enter through a certain door. Lets say the time
> > range
> > >>> I am interested in Last 6 hours( to get the total count, I know that
> I
> > ll
> > >>> have to use Ranged Facets). How do I sample this time range and get
> the
> > >>> counts of each sample? In other words, as an example, If I split the
> > last
> > >>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time
> > ranges. I
> > >>> would be interested in getting hit counts for each of these 72 ranges
> > in an
> > >>> array with the respective lower bound of each sample. Can someone
> > point me
> > >>> the direction I should follow/ the classes which can be helpful
> > looking at?
> > >>> ElasticSearch already has this feature exposed by their Javascript
> API.
> > >>>
> > >>> Is it possible to implement the same with lucene?
> > >>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> --
> > >>> Gimantha Bandara
> > >>> Software Engineer
> > >>> WSO2. Inc : http://wso2.com
> > >>> Mobile : +94714961919
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Gimantha Bandara
> > >> Software Engineer
> > >> WSO2. Inc : http://wso2.com
> > >> Mobile : +94714961919
> > >>
> > >
> > >
> > >
> > > --
> > > Gimantha Bandara
> > > Software Engineer
> > > WSO2. Inc : http://wso2.com
> > > Mobile : +94714961919
> > >
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Difference between StoredField vs Other Fields with Field.Store.YES

2015-03-11 Thread Gimantha Bandara
Hi all,

Is there a difference between using StoredField and using other types of
fields with Field.Store.YES?

Another question, Is it a good practise to use NumericDocValuesField
instead of using usual Fields (IntField, LongField, StringField ...etc)
with Field.Store.NO ?
-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Sampled Hit counts using Lucene Facets.

2015-03-11 Thread Gimantha Bandara
Hi Shai,

Yes.. Bucketing is the word :) .. IMO it would be better if bucketing is
moved to a utility class. I ll create a JIRA and provide a patch.

Thanks!

On Wed, Mar 11, 2015 at 4:33 PM, Shai Erera  wrote:

> OK yes then sampling isn't the right word. So what you would want to have
> is API like "count faces in N buckets between a range of [min..max]
> values". That would create the ranges for you and then you would be able to
> use the RangeFacetCounts as usual.
>
> Would you like to open a JIRA issue and post a patch? I guess it can either
> be an additional constructor on LongRangeFacetCounts (and Double), or a
> separate utility class which given min/max values and numBuckets, creates
> the proper Range[]?
>
> Shai
>
> On Tue, Mar 10, 2015 at 4:07 PM, Gimantha Bandara 
> wrote:
>
> > Hi Shai,
> >
> > Yes, Splitting ranges into smaller ranges is not as same as sampling. I
> > have used the wrong word there. I think RandomSamplingFacetsCollector is
> > for "sampling" a larger dataset and that class cannot be used to
> implement
> > the described example above. I think I ll have to prepare the Ranges
> > manually and pass them to LongRangeFacetsCounts.
> >
> > On Tue, Mar 10, 2015 at 4:54 PM, Shai Erera  wrote:
> >
> > > I am not sure that splitting the ranges into smaller ranges is the same
> > as
> > > sampling.
> > >
> > > Take a look RandomSamplingFacetsCollector - it implements sampling by
> > > sampling the document space, not the facet values space.
> > >
> > > So if for instance you use a LongRangeFacetCounts in conjunction with a
> > > RandomSamplingFacetsCollector, you would get the matching documents
> space
> > > sampled, and the counts you would get for each range could be
> considered
> > > "sampled" too. This is at least how we implemented facet sampling.
> > >
> > > Shai
> > >
> > > On Tue, Mar 10, 2015 at 10:21 AM, Gimantha Bandara 
> > > wrote:
> > >
> > > > What I am planning to do is, split the given time range into smaller
> > time
> > > > ranges  by myself and pass them to a LongRangeFacetsCount object and
> > get
> > > > the counts for each sub range. Is this the correct way?
> > > >
> > > > On Tue, Mar 10, 2015 at 12:01 AM, Gimantha Bandara <
> giman...@wso2.com>
> > > > wrote:
> > > >
> > > > > Any updates on this please? Do I have to write my own code to
> sample
> > > and
> > > > > get the hitcount?
> > > > >
> > > > > On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara <
> giman...@wso2.com>
> > > > > wrote:
> > > > >
> > > > >> Any help on this please?
> > > > >>
> > > > >> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara <
> giman...@wso2.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I am trying to create some APIs using lucene facets APIs. First I
> > > will
> > > > >>> explain my requirement with an example. Lets say I am keeping
> track
> > > of
> > > > the
> > > > >>> count of  people who enter through a certain door. Lets say the
> > time
> > > > range
> > > > >>> I am interested in Last 6 hours( to get the total count, I know
> > that
> > > I
> > > > ll
> > > > >>> have to use Ranged Facets). How do I sample this time range and
> get
> > > the
> > > > >>> counts of each sample? In other words, as an example, If I split
> > the
> > > > last
> > > > >>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time
> > > > ranges. I
> > > > >>> would be interested in getting hit counts for each of these 72
> > ranges
> > > > in an
> > > > >>> array with the respective lower bound of each sample. Can someone
> > > > point me
> > > > >>> the direction I should follow/ the classes which can be helpful
> > > > looking at?
> > > > >>> ElasticSearch already has this feature exposed by their
> Javascript
> > > API.
> > > > >>>
> > > > >>> Is it possible to implement the same with lucene?
> > > > >>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> --
> > > > >>> Gimantha Bandara
> > > > >>> Software Engineer
> > > > >>> WSO2. Inc : http://wso2.com
> > > > >>> Mobile : +94714961919
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Gimantha Bandara
> > > > >> Software Engineer
> > > > >> WSO2. Inc : http://wso2.com
> > > > >> Mobile : +94714961919
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Gimantha Bandara
> > > > > Software Engineer
> > > > > WSO2. Inc : http://wso2.com
> > > > > Mobile : +94714961919
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Gimantha Bandara
> > > > Software Engineer
> > > > WSO2. Inc : http://wso2.com
> > > > Mobile : +94714961919
> > > >
> > >
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Would Like to contribute to Lucene

2015-03-12 Thread Gimantha Bandara
Hi all,

I am willing to contribute to Lucene project. I have already been referring
to "Lucene in Action" 2nd edition recently. But I think it is outdated. It
is based on lucene 3.0.x I guess. Even through online resources, it is very
hard to learn the internals of lucene because of the lack of up-to-date
resources. Can someone recommend a recently released book on lucene
internals or has someone planned to write one? What would be the starting
point if I need to learn the internals of Lucene?

Thanks,

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Would Like to contribute to Lucene

2015-03-19 Thread Gimantha Bandara
Any clue on where to start from?

On Fri, Mar 13, 2015 at 11:24 AM, Gimantha Bandara 
wrote:

> Hi all,
>
> I am willing to contribute to Lucene project. I have already been
> referring to "Lucene in Action" 2nd edition recently. But I think it is
> outdated. It is based on lucene 3.0.x I guess. Even through online
> resources, it is very hard to learn the internals of lucene because of the
> lack of up-to-date resources. Can someone recommend a recently released
> book on lucene internals or has someone planned to write one? What would be
> the starting point if I need to learn the internals of Lucene?
>
> Thanks,
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


How to merge several Taxonomy indexes

2015-03-23 Thread Gimantha Bandara
Hi all,

Can anyone point me how to merge several taxonomy indexes? My requirement
is as follows. I have  several taxonomy indexes and normal document
indexes. I want to merge taxonomy indexes together and other document
indexes together and perform search on them. One part I have figured out.
It is easy. To Merge document indexes, all I have to do is create a
MultiReader and pass it to IndexSearcher. But I am stuck at merging the
taxonomy indexes. Is there a way to merge taxonomy indexes?

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: How to merge several Taxonomy indexes

2015-03-23 Thread Gimantha Bandara
Hi Christoph,

I think TaxonomyMergeUtils is to merge a taxonomy directory and an index
together (Correct me if I am wrong). Can it be used to merge several
taxonomyDirectories together and create one taxonomy index?

On Mon, Mar 23, 2015 at 9:19 PM, Christoph Kaser 
wrote:

> Hi Gimantha,
>
> have a look at the class org.apache.lucene.facet.taxonomy.TaxonomyMergeUtils,
> which does exactly what you need.
>
> Best regards,
> Christoph
>
> Am 23.03.2015 um 15:44 schrieb Gimantha Bandara:
>
>> Hi all,
>>
>> Can anyone point me how to merge several taxonomy indexes? My requirement
>> is as follows. I have  several taxonomy indexes and normal document
>> indexes. I want to merge taxonomy indexes together and other document
>> indexes together and perform search on them. One part I have figured out.
>> It is easy. To Merge document indexes, all I have to do is create a
>> MultiReader and pass it to IndexSearcher. But I am stuck at merging the
>> taxonomy indexes. Is there a way to merge taxonomy indexes?
>>
>>
>
> --
> Dipl.-Inf. Christoph Kaser
>
> IconParc GmbH
> Sophienstrasse 1
> 80333 München
>
> www.iconparc.de
>
> Tel +49 -89- 15 90 06 - 21
> Fax +49 -89- 15 90 06 - 49
>
> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
> 121830, Amtsgericht München
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: How to merge several Taxonomy indexes

2015-03-23 Thread Gimantha Bandara
Hi Christoph,

My mistake. :) It does the exactly what i need. figured it out later..
Thanks a lot!

On Tue, Mar 24, 2015 at 3:14 AM, Gimantha Bandara  wrote:

> Hi Christoph,
>
> I think TaxonomyMergeUtils is to merge a taxonomy directory and an index
> together (Correct me if I am wrong). Can it be used to merge several
> taxonomyDirectories together and create one taxonomy index?
>
> On Mon, Mar 23, 2015 at 9:19 PM, Christoph Kaser 
> wrote:
>
>> Hi Gimantha,
>>
>> have a look at the class org.apache.lucene.facet.taxonomy.TaxonomyMergeUtils,
>> which does exactly what you need.
>>
>> Best regards,
>> Christoph
>>
>> Am 23.03.2015 um 15:44 schrieb Gimantha Bandara:
>>
>>> Hi all,
>>>
>>> Can anyone point me how to merge several taxonomy indexes? My requirement
>>> is as follows. I have  several taxonomy indexes and normal document
>>> indexes. I want to merge taxonomy indexes together and other document
>>> indexes together and perform search on them. One part I have figured out.
>>> It is easy. To Merge document indexes, all I have to do is create a
>>> MultiReader and pass it to IndexSearcher. But I am stuck at merging the
>>> taxonomy indexes. Is there a way to merge taxonomy indexes?
>>>
>>>
>>
>> --
>> Dipl.-Inf. Christoph Kaser
>>
>> IconParc GmbH
>> Sophienstrasse 1
>> 80333 München
>>
>> www.iconparc.de
>>
>> Tel +49 -89- 15 90 06 - 21
>> Fax +49 -89- 15 90 06 - 49
>>
>> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
>> 121830, Amtsgericht München
>>
>>
>>
>> ---------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Getting the doc values grouped by Facets

2015-03-26 Thread Gimantha Bandara
Hi,

I have some lucene documents indexed. They contain some facet fields. I
wrote some drilldown query and by using getTopChildren, I can get the facet
labels and the value/count. I am wondering if it is possible to get the doc
values of the documents under each facet. So I can list the documents
grouped by facets. Is it possible?


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Would Like to contribute to Lucene

2015-03-27 Thread Gimantha Bandara
Hi all,

Thanks a lot for your suggestions!
I believe the current lucene code repository is
https://github.com/apache/lucene-solr/tree/trunk/lucene

On Sat, Mar 28, 2015 at 5:49 AM, Jack Krupansky 
wrote:

> +1 for starting with unit tests - they show you how things work, give you
> something to step through in the debugger, are (or should be!) always
> current, and are a great place to start for contributing code, like
> improving coverage and optimizing coverage. Commenting code and enhancing
> the Javadoc is always a great contribution.
>
>
> -- Jack Krupansky
>
> On Thu, Mar 26, 2015 at 8:15 PM, Erick Erickson 
> wrote:
>
> > You really have to just pick a problem, dive into the code and learn
> > it bit by bit through exploration. The code base changes fast enough
> > that anything published will be out of date in short order.
> >
> > Here's a suggestion: Take a look at the coverage reports for unit
> > tests, pick some code that doesn't have coverage and write a test.
> > Believe me, that'll get you familiar with _something_ pretty quickly,
> > and something like that provides  a focus. It's a mistake to try to
> > understand all of Lucene IMO, that'll take years.
> >
> > FWIW,
> > Erick
> >
> > On Thu, Mar 26, 2015 at 4:42 PM, Adrien Grand  wrote:
> > > Hi Gimantha,
> > >
> > > There is no recent book. However, there is some interesting content
> > > that you can find about Lucene and Solr internals scattered in blog
> > > posts and conference presentations. I would recommend having a look at
> > > Mike's blog http://blog.mikemccandless.com/ and videos of Lucene
> > > Revolution, ApacheCon and BerlinBuzzwords which regularly get a fair
> > > amount of Lucene/Solr-related talks.
> > >
> > > On Fri, Mar 13, 2015 at 6:54 AM, Gimantha Bandara 
> > wrote:
> > >> Hi all,
> > >>
> > >> I am willing to contribute to Lucene project. I have already been
> > referring
> > >> to "Lucene in Action" 2nd edition recently. But I think it is
> outdated.
> > It
> > >> is based on lucene 3.0.x I guess. Even through online resources, it is
> > very
> > >> hard to learn the internals of lucene because of the lack of
> up-to-date
> > >> resources. Can someone recommend a recently released book on lucene
> > >> internals or has someone planned to write one? What would be the
> > starting
> > >> point if I need to learn the internals of Lucene?
> > >>
> > >> Thanks,
> > >>
> > >> --
> > >> Gimantha Bandara
> > >> Software Engineer
> > >> WSO2. Inc : http://wso2.com
> > >> Mobile : +94714961919
> > >
> > >
> > >
> > > --
> > > Adrien
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: for check similarity of two sentences

2015-04-02 Thread Gimantha Bandara
Hi Heshan,
I think you can achieve what you are looking for. You may read "lucene in
Action 2nd edition" about lucene scoring system and FuzzyQuery. Hope this
may help. May be someone can suggest much better approach.

On Wed, Apr 1, 2015 at 8:14 AM, hesh jay  wrote:

> hi,
> I am second year undergraduate of University of Moratuwa,SriLanka.My second
> year project I am doing Question answering system(Knowledge base).In this
> project i have to suggest similar question perviously asked by other users.
> I should find similarity of two Sentences in my application to suggest
> those questions.can i do that using Apache Lucene?
> Thank You!
> regards,
> Heshan jayasinghe
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: How to merge several Taxonomy indexes

2015-04-02 Thread Gimantha Bandara
Hi All,

I have successfully setup a merged indices and drilldown and usual search
operations work perfect.
But, I have a side question. If I selected RAMDirectory as the destination
Indices in merging, probably the jvm can go out of memory if the merged
indices are too big. Is there a way I can handle this issue?

On Tue, Mar 24, 2015 at 12:18 PM, Gimantha Bandara 
wrote:

> Hi Christoph,
>
> My mistake. :) It does the exactly what i need. figured it out later..
> Thanks a lot!
>
> On Tue, Mar 24, 2015 at 3:14 AM, Gimantha Bandara 
> wrote:
>
>> Hi Christoph,
>>
>> I think TaxonomyMergeUtils is to merge a taxonomy directory and an index
>> together (Correct me if I am wrong). Can it be used to merge several
>> taxonomyDirectories together and create one taxonomy index?
>>
>> On Mon, Mar 23, 2015 at 9:19 PM, Christoph Kaser > > wrote:
>>
>>> Hi Gimantha,
>>>
>>> have a look at the class 
>>> org.apache.lucene.facet.taxonomy.TaxonomyMergeUtils,
>>> which does exactly what you need.
>>>
>>> Best regards,
>>> Christoph
>>>
>>> Am 23.03.2015 um 15:44 schrieb Gimantha Bandara:
>>>
>>>> Hi all,
>>>>
>>>> Can anyone point me how to merge several taxonomy indexes? My
>>>> requirement
>>>> is as follows. I have  several taxonomy indexes and normal document
>>>> indexes. I want to merge taxonomy indexes together and other document
>>>> indexes together and perform search on them. One part I have figured
>>>> out.
>>>> It is easy. To Merge document indexes, all I have to do is create a
>>>> MultiReader and pass it to IndexSearcher. But I am stuck at merging the
>>>> taxonomy indexes. Is there a way to merge taxonomy indexes?
>>>>
>>>>
>>>
>>> --
>>> Dipl.-Inf. Christoph Kaser
>>>
>>> IconParc GmbH
>>> Sophienstrasse 1
>>> 80333 München
>>>
>>> www.iconparc.de
>>>
>>> Tel +49 -89- 15 90 06 - 21
>>> Fax +49 -89- 15 90 06 - 49
>>>
>>> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer.
>>> HRB
>>> 121830, Amtsgericht München
>>>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>
>>
>> --
>> Gimantha Bandara
>> Software Engineer
>> WSO2. Inc : http://wso2.com
>> Mobile : +94714961919
>>
>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: How to merge several Taxonomy indexes

2015-04-02 Thread Gimantha Bandara
Hi Christoph and Shai,

Thanks for the quick response!.
Indices are stored in a relational database ( using a custom Directory
implementation ). The Problem comes since the indices are sharded (both
taxonomy indices and normal doc indices), when a user wants to drilldown, I
have to merge all the indices. For that I used mergeUtils (which
worksperfect). For now I am using RAMDirectory as the merged indices.
Anyway The indices can grow to a bigger size as time goes. MMapDirectory
again uses memory right? Can It deal with possible out of memory issue?

I am thinking of using the same Database to store the merged indices. But
the problem is the original sharded indices can be updated, when new
entries come in. So the merged final indices also needs to be updated
accordingly.

On Thu, Apr 2, 2015 at 4:55 PM, Shai Erera  wrote:

> In some cases, MMapDirectory offers even better performance, since the JVM
> doesn't need to manage that RAM when it's doing GC.
>
> Also, using only RAMDirectory is not safe in that if the JVM crashes, your
> index is lost.
>
> On Thu, Apr 2, 2015 at 12:54 PM, Christoph Kaser 
> wrote:
>
> > Hi Gimantha,
> >
> > why do you use a RAMDirectory? If your merged index fits into RAM
> > completely, a MMapDirectory should offer almost the same performance. And
> > if not, it is definitely the better choice.
> >
> > Regards
> > Christoph
> >
> >
> > Am 02.04.2015 um 12:38 schrieb Gimantha Bandara:
> >
> >> Hi All,
> >>
> >> I have successfully setup a merged indices and drilldown and usual
> search
> >> operations work perfect.
> >> But, I have a side question. If I selected RAMDirectory as the
> destination
> >> Indices in merging, probably the jvm can go out of memory if the merged
> >> indices are too big. Is there a way I can handle this issue?
> >>
> >> On Tue, Mar 24, 2015 at 12:18 PM, Gimantha Bandara 
> >> wrote:
> >>
> >>  Hi Christoph,
> >>>
> >>> My mistake. :) It does the exactly what i need. figured it out later..
> >>> Thanks a lot!
> >>>
> >>> On Tue, Mar 24, 2015 at 3:14 AM, Gimantha Bandara 
> >>> wrote:
> >>>
> >>>  Hi Christoph,
> >>>>
> >>>> I think TaxonomyMergeUtils is to merge a taxonomy directory and an
> index
> >>>> together (Correct me if I am wrong). Can it be used to merge several
> >>>> taxonomyDirectories together and create one taxonomy index?
> >>>>
> >>>> On Mon, Mar 23, 2015 at 9:19 PM, Christoph Kaser <
> >>>> lucene_l...@iconparc.de
> >>>>
> >>>>> wrote:
> >>>>> Hi Gimantha,
> >>>>>
> >>>>> have a look at the class org.apache.lucene.facet.
> >>>>> taxonomy.TaxonomyMergeUtils,
> >>>>> which does exactly what you need.
> >>>>>
> >>>>> Best regards,
> >>>>> Christoph
> >>>>>
> >>>>> Am 23.03.2015 um 15:44 schrieb Gimantha Bandara:
> >>>>>
> >>>>>  Hi all,
> >>>>>>
> >>>>>> Can anyone point me how to merge several taxonomy indexes? My
> >>>>>> requirement
> >>>>>> is as follows. I have  several taxonomy indexes and normal document
> >>>>>> indexes. I want to merge taxonomy indexes together and other
> document
> >>>>>> indexes together and perform search on them. One part I have figured
> >>>>>> out.
> >>>>>> It is easy. To Merge document indexes, all I have to do is create a
> >>>>>> MultiReader and pass it to IndexSearcher. But I am stuck at merging
> >>>>>> the
> >>>>>> taxonomy indexes. Is there a way to merge taxonomy indexes?
> >>>>>>
> >>>>>>
> >>>>>>  --
> >>>>> Dipl.-Inf. Christoph Kaser
> >>>>>
> >>>>> IconParc GmbH
> >>>>> Sophienstrasse 1
> >>>>> 80333 München
> >>>>>
> >>>>> www.iconparc.de
> >>>>>
> >>>>> Tel +49 -89- 15 90 06 - 21
> >>>>> Fax +49 -89- 15 90 06 - 49
> >>>>>
> >>>>> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven
> Angerer.
> >>>>> HRB
> >>>>> 121830, Amtsgericht München
> >>>>>
> >>>>>
> >>>>>
> >>>>> -
> >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>>>
> >>>>>
> >>>>>  --
> >>>> Gimantha Bandara
> >>>> Software Engineer
> >>>> WSO2. Inc : http://wso2.com
> >>>> Mobile : +94714961919
> >>>>
> >>>>  --
> >>> Gimantha Bandara
> >>> Software Engineer
> >>> WSO2. Inc : http://wso2.com
> >>> Mobile : +94714961919
> >>>
> >>>
> >>
> >
> > --
> > Dipl.-Inf. Christoph Kaser
> >
> > IconParc GmbH
> > Sophienstrasse 1
> > 80333 München
> >
> > www.iconparc.de
> >
> > Tel +49 -89- 15 90 06 - 21
> > Fax +49 -89- 15 90 06 - 49
> >
> > Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer.
> HRB
> > 121830, Amtsgericht München
> >
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: How to merge several Taxonomy indexes

2015-04-02 Thread Gimantha Bandara
Btw I was using a RAMDirectory for just testing purposes..

On Thu, Apr 2, 2015 at 5:16 PM, Gimantha Bandara  wrote:

> Hi Christoph and Shai,
>
> Thanks for the quick response!.
> Indices are stored in a relational database ( using a custom Directory
> implementation ). The Problem comes since the indices are sharded (both
> taxonomy indices and normal doc indices), when a user wants to drilldown, I
> have to merge all the indices. For that I used mergeUtils (which
> worksperfect). For now I am using RAMDirectory as the merged indices.
> Anyway The indices can grow to a bigger size as time goes. MMapDirectory
> again uses memory right? Can It deal with possible out of memory issue?
>
> I am thinking of using the same Database to store the merged indices. But
> the problem is the original sharded indices can be updated, when new
> entries come in. So the merged final indices also needs to be updated
> accordingly.
>
> On Thu, Apr 2, 2015 at 4:55 PM, Shai Erera  wrote:
>
>> In some cases, MMapDirectory offers even better performance, since the JVM
>> doesn't need to manage that RAM when it's doing GC.
>>
>> Also, using only RAMDirectory is not safe in that if the JVM crashes, your
>> index is lost.
>>
>> On Thu, Apr 2, 2015 at 12:54 PM, Christoph Kaser > >
>> wrote:
>>
>> > Hi Gimantha,
>> >
>> > why do you use a RAMDirectory? If your merged index fits into RAM
>> > completely, a MMapDirectory should offer almost the same performance.
>> And
>> > if not, it is definitely the better choice.
>> >
>> > Regards
>> > Christoph
>> >
>> >
>> > Am 02.04.2015 um 12:38 schrieb Gimantha Bandara:
>> >
>> >> Hi All,
>> >>
>> >> I have successfully setup a merged indices and drilldown and usual
>> search
>> >> operations work perfect.
>> >> But, I have a side question. If I selected RAMDirectory as the
>> destination
>> >> Indices in merging, probably the jvm can go out of memory if the merged
>> >> indices are too big. Is there a way I can handle this issue?
>> >>
>> >> On Tue, Mar 24, 2015 at 12:18 PM, Gimantha Bandara 
>> >> wrote:
>> >>
>> >>  Hi Christoph,
>> >>>
>> >>> My mistake. :) It does the exactly what i need. figured it out later..
>> >>> Thanks a lot!
>> >>>
>> >>> On Tue, Mar 24, 2015 at 3:14 AM, Gimantha Bandara 
>> >>> wrote:
>> >>>
>> >>>  Hi Christoph,
>> >>>>
>> >>>> I think TaxonomyMergeUtils is to merge a taxonomy directory and an
>> index
>> >>>> together (Correct me if I am wrong). Can it be used to merge several
>> >>>> taxonomyDirectories together and create one taxonomy index?
>> >>>>
>> >>>> On Mon, Mar 23, 2015 at 9:19 PM, Christoph Kaser <
>> >>>> lucene_l...@iconparc.de
>> >>>>
>> >>>>> wrote:
>> >>>>> Hi Gimantha,
>> >>>>>
>> >>>>> have a look at the class org.apache.lucene.facet.
>> >>>>> taxonomy.TaxonomyMergeUtils,
>> >>>>> which does exactly what you need.
>> >>>>>
>> >>>>> Best regards,
>> >>>>> Christoph
>> >>>>>
>> >>>>> Am 23.03.2015 um 15:44 schrieb Gimantha Bandara:
>> >>>>>
>> >>>>>  Hi all,
>> >>>>>>
>> >>>>>> Can anyone point me how to merge several taxonomy indexes? My
>> >>>>>> requirement
>> >>>>>> is as follows. I have  several taxonomy indexes and normal document
>> >>>>>> indexes. I want to merge taxonomy indexes together and other
>> document
>> >>>>>> indexes together and perform search on them. One part I have
>> figured
>> >>>>>> out.
>> >>>>>> It is easy. To Merge document indexes, all I have to do is create a
>> >>>>>> MultiReader and pass it to IndexSearcher. But I am stuck at merging
>> >>>>>> the
>> >>>>>> taxonomy indexes. Is there a way to merge taxonomy indexes?
>> >>>>>>
>> >>>>>>
>> >>>>>>  --
>> >>>>> Dipl.-Inf. Christoph Kaser
>> >>>>>
>> &g

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Gimantha Bandara
Hi Shai

Currently I am using a DB, But the platform we are developing needs to
support RDBMS, HBase and other Datasource types for indices to be stored.
So the user should be able to use whatever the underlying filesystem he
wants to use. I am not sure if Solr can support multiple datasource types.
I would like to continue with Lucene with MMapDirectory. Will update if I
have a question.. Thanks a lot!

On Thu, Apr 2, 2015 at 5:39 PM, Shai Erera  wrote:

> MMapDirectory uses memory-mapped files. This is an operating system level
> feature, where even though the file resides on disk, the OS can memory-map
> it and access it more efficiently. It is loaded into memory outside the JVM
> heap, and usually on a properly configured server you should not worry
> about running out of memory, since if the file cannot be brought into
> memory, it's accessed from disk.
>
> You mentioned that you store the index in a DB, which is distributed. Have
> you considered using Solr for managing your distributed index? It might be
> better than storing it in a DB, merging taxonomies for search etc. and Solr
> has quite rich faceted search capabilities.
>
> On Thu, Apr 2, 2015 at 1:51 PM, Gimantha Bandara 
> wrote:
>
> > Btw I was using a RAMDirectory for just testing purposes..
> >
> > On Thu, Apr 2, 2015 at 5:16 PM, Gimantha Bandara 
> > wrote:
> >
> > > Hi Christoph and Shai,
> > >
> > > Thanks for the quick response!.
> > > Indices are stored in a relational database ( using a custom Directory
> > > implementation ). The Problem comes since the indices are sharded (both
> > > taxonomy indices and normal doc indices), when a user wants to
> > drilldown, I
> > > have to merge all the indices. For that I used mergeUtils (which
> > > worksperfect). For now I am using RAMDirectory as the merged indices.
> > > Anyway The indices can grow to a bigger size as time goes.
> MMapDirectory
> > > again uses memory right? Can It deal with possible out of memory issue?
> > >
> > > I am thinking of using the same Database to store the merged indices.
> But
> > > the problem is the original sharded indices can be updated, when new
> > > entries come in. So the merged final indices also needs to be updated
> > > accordingly.
> > >
> > > On Thu, Apr 2, 2015 at 4:55 PM, Shai Erera  wrote:
> > >
> > >> In some cases, MMapDirectory offers even better performance, since the
> > JVM
> > >> doesn't need to manage that RAM when it's doing GC.
> > >>
> > >> Also, using only RAMDirectory is not safe in that if the JVM crashes,
> > your
> > >> index is lost.
> > >>
> > >> On Thu, Apr 2, 2015 at 12:54 PM, Christoph Kaser <
> > lucene_l...@iconparc.de
> > >> >
> > >> wrote:
> > >>
> > >> > Hi Gimantha,
> > >> >
> > >> > why do you use a RAMDirectory? If your merged index fits into RAM
> > >> > completely, a MMapDirectory should offer almost the same
> performance.
> > >> And
> > >> > if not, it is definitely the better choice.
> > >> >
> > >> > Regards
> > >> > Christoph
> > >> >
> > >> >
> > >> > Am 02.04.2015 um 12:38 schrieb Gimantha Bandara:
> > >> >
> > >> >> Hi All,
> > >> >>
> > >> >> I have successfully setup a merged indices and drilldown and usual
> > >> search
> > >> >> operations work perfect.
> > >> >> But, I have a side question. If I selected RAMDirectory as the
> > >> destination
> > >> >> Indices in merging, probably the jvm can go out of memory if the
> > merged
> > >> >> indices are too big. Is there a way I can handle this issue?
> > >> >>
> > >> >> On Tue, Mar 24, 2015 at 12:18 PM, Gimantha Bandara <
> > giman...@wso2.com>
> > >> >> wrote:
> > >> >>
> > >> >>  Hi Christoph,
> > >> >>>
> > >> >>> My mistake. :) It does the exactly what i need. figured it out
> > later..
> > >> >>> Thanks a lot!
> > >> >>>
> > >> >>> On Tue, Mar 24, 2015 at 3:14 AM, Gimantha Bandara <
> > giman...@wso2.com>
> > >> >>> wrote:
> > >> >>>
> > >> >>>  Hi Christoph,
> > >> >>>>
> > >> >

How to read multiple indices in parallel.

2015-04-07 Thread Gimantha Bandara
Hi all,

As I can see the Multireader is reading the multiple indices sequentially
(correct me if I am wrong). So using a IndexSearcher on a multireader will
also perform sequential searches right? Is there a lucene-built-in class to
search several indices parallely?

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: How to read multiple indices in parallel.

2015-04-07 Thread Gimantha Bandara
Hi Terry,

I have multiple indices in separate locations. If I used multireader and
used an executorservice with the indexSearcher It will go thru the segments
in parallel and search right? But still searching between different indices
will happen sequentially..Isnt it?

On Tue, Apr 7, 2015 at 7:15 PM, Terry Smith  wrote:

> Gimantha,
>
> With Lucene 5.0 you can pass in an ExecutorService to the constructor of
> your IndexSearcher and it will search the segments in parallel if you use
> one of the IndexSearcher.search() methods that returns a TopDocs (and don't
> supply your own Collector).
>
> The not-yet-released Lucene 5.1 includes some changes (LUCENE-6294
> <https://issues.apache.org/jira/browse/LUCENE-6294>) that enable better
> parallel query support.
>
> --Terry
>
>
> On Tue, Apr 7, 2015 at 8:09 AM, Gimantha Bandara 
> wrote:
>
> > Hi all,
> >
> > As I can see the Multireader is reading the multiple indices sequentially
> > (correct me if I am wrong). So using a IndexSearcher on a multireader
> will
> > also perform sequential searches right? Is there a lucene-built-in class
> to
> > search several indices parallely?
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: How to read multiple indices in parallel.

2015-04-07 Thread Gimantha Bandara
That was really helpful. Thanks a lot Terry!

On Tue, Apr 7, 2015 at 8:17 PM, Terry Smith  wrote:

> Gimantha,
>
> Search will run in parallel even across indices.
>
> This happens because IndexSearcher searches by LeafReader and it doesn't
> matter where those LeafReaders come from (DirectoryReader or MultiReader)
> they are all treated equally.
>
> Example:
>
> DirectoryReader(A):
> LeafReader(B), LeafReader(C)
>
> DirectoryReader(D):
> LeafReader(E), LeafReader(F)
>
> Searching over A would use leaves B, C.
> Searching over B would use leaves E, F.
> Searching over a MultiReader on (A, B) would use leaves B, C, E, F.
>
> This runs serially by default but can run in parallel if you provide an
> ExecutorService to the IndexSearcher and use a compatible search() method
> on it.
>
> --Terry
>
>
> On Tue, Apr 7, 2015 at 10:27 AM, Gimantha Bandara 
> wrote:
>
> > Hi Terry,
> >
> > I have multiple indices in separate locations. If I used multireader and
> > used an executorservice with the indexSearcher It will go thru the
> segments
> > in parallel and search right? But still searching between different
> indices
> > will happen sequentially..Isnt it?
> >
> > On Tue, Apr 7, 2015 at 7:15 PM, Terry Smith  wrote:
> >
> > > Gimantha,
> > >
> > > With Lucene 5.0 you can pass in an ExecutorService to the constructor
> of
> > > your IndexSearcher and it will search the segments in parallel if you
> use
> > > one of the IndexSearcher.search() methods that returns a TopDocs (and
> > don't
> > > supply your own Collector).
> > >
> > > The not-yet-released Lucene 5.1 includes some changes (LUCENE-6294
> > > <https://issues.apache.org/jira/browse/LUCENE-6294>) that enable
> better
> > > parallel query support.
> > >
> > > --Terry
> > >
> > >
> > > On Tue, Apr 7, 2015 at 8:09 AM, Gimantha Bandara 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > As I can see the Multireader is reading the multiple indices
> > sequentially
> > > > (correct me if I am wrong). So using a IndexSearcher on a multireader
> > > will
> > > > also perform sequential searches right? Is there a lucene-built-in
> > class
> > > to
> > > > search several indices parallely?
> > > >
> > > > --
> > > > Gimantha Bandara
> > > > Software Engineer
> > > > WSO2. Inc : http://wso2.com
> > > > Mobile : +94714961919
> > > >
> > >
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Joining two Indices in Lucene

2015-04-24 Thread Gimantha Bandara
Hi,

I am now looking into BlockJoinQuery where I can join two indices and
execute search queries. My concern is this... Is it possible to perform
"AND" and "OR" operations between two seperate indices? As an example, I
have a common field "_id" in both indices. I will relate the two indices(
using "fromField and toField"). Lets say field "title" is only in the first
index and field "ISBN" is in 2nd index. I would like to get the unique
"_id" values where "title:SOME_VALUE" AND "ISBN:SOME_OTHER_VALUE" . Is this
possible?

Thanks,
Gimantha


Exception while updating a lucene document

2015-04-24 Thread Gimantha Bandara
Hi all,

I have documents which have some facetfields. If I tried to update a
document where the facet values are same in both facet fields i am getting
the following error. Note I am using "updateDocument" in indexWriter to
create the document.

Exception in thread "pool-23-thread-2" java.lang.IllegalArgumentException:
DocValuesField "$facets" appears more than once in this document (only one
value is allowed per field)
at
org.apache.lucene.index.BinaryDocValuesWriter.addValue(BinaryDocValuesWriter.java:70)
at
org.apache.lucene.index.DefaultIndexingChain.indexDocValue(DefaultIndexingChain.java:445)
at
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:392)
at
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1488)
at
org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.updateIndex(AnalyticsDataIndexer.java:1055)
at
org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexUpdateOpBatches(AnalyticsDataIndexer.java:370)
at
org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexUpdateOperations(AnalyticsDataIndexer.java:408)
at
org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:421)
at
org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$200(AnalyticsDataIndexer.java:115)
at
org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1731)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)


Any clue?

-- 
Thanks,
Gimantha


Re: Exception while updating a lucene document

2015-04-25 Thread Gimantha Bandara
Hi,

I was able to fix the problem.. the issue was with my wrong usage of
FacetConfig class. I was creating Document using facetConfig.build per each
facet field with new FacetConfig object per each facetfield.

Solution was to use one global FacetConfig per document add the facetfields
to the document, Call the facetConfig.build at last. Then call the
updateDocument on the created document.


On Fri, Apr 24, 2015 at 10:13 PM, Gimantha Bandara 
wrote:

> Hi all,
>
> I have documents which have some facetfields. If I tried to update a
> document where the facet values are same in both facet fields i am getting
> the following error. Note I am using "updateDocument" in indexWriter to
> create the document.
>
> Exception in thread "pool-23-thread-2" java.lang.IllegalArgumentException:
> DocValuesField "$facets" appears more than once in this document (only one
> value is allowed per field)
> at
> org.apache.lucene.index.BinaryDocValuesWriter.addValue(BinaryDocValuesWriter.java:70)
> at
> org.apache.lucene.index.DefaultIndexingChain.indexDocValue(DefaultIndexingChain.java:445)
> at
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:392)
> at
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
> at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1488)
> at
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.updateIndex(AnalyticsDataIndexer.java:1055)
> at
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexUpdateOpBatches(AnalyticsDataIndexer.java:370)
> at
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexUpdateOperations(AnalyticsDataIndexer.java:408)
> at
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:421)
> at
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$200(AnalyticsDataIndexer.java:115)
> at
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1731)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
>
> Any clue?
>
> --
> Thanks,
> Gimantha
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Exception while updating a lucene document

2015-04-26 Thread Gimantha Bandara
Hi Shenq,

I had already set multivalued to true. The problem was with how I used
facetsConfig object. I was building the doc for each facet field using
updateDocument and each time I build the document, I was using a new
FacetConfig object.

On Sun, Apr 26, 2015 at 6:12 AM, Sheng  wrote:

> seems like you forgot to do facetsConfig.setMultiValued(`field`, true) too
> .
>
> On Sat, Apr 25, 2015 at 7:37 AM, Gimantha Bandara 
> wrote:
>
> > Hi,
> >
> > I was able to fix the problem.. the issue was with my wrong usage of
> > FacetConfig class. I was creating Document using facetConfig.build per
> each
> > facet field with new FacetConfig object per each facetfield.
> >
> > Solution was to use one global FacetConfig per document add the
> facetfields
> > to the document, Call the facetConfig.build at last. Then call the
> > updateDocument on the created document.
> >
> >
> > On Fri, Apr 24, 2015 at 10:13 PM, Gimantha Bandara 
> > wrote:
> >
> > > Hi all,
> > >
> > > I have documents which have some facetfields. If I tried to update a
> > > document where the facet values are same in both facet fields i am
> > getting
> > > the following error. Note I am using "updateDocument" in indexWriter to
> > > create the document.
> > >
> > > Exception in thread "pool-23-thread-2"
> > java.lang.IllegalArgumentException:
> > > DocValuesField "$facets" appears more than once in this document (only
> > one
> > > value is allowed per field)
> > > at
> > >
> >
> org.apache.lucene.index.BinaryDocValuesWriter.addValue(BinaryDocValuesWriter.java:70)
> > > at
> > >
> >
> org.apache.lucene.index.DefaultIndexingChain.indexDocValue(DefaultIndexingChain.java:445)
> > > at
> > >
> >
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:392)
> > > at
> > >
> >
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
> > > at
> > >
> >
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
> > > at
> > >
> >
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)
> > > at
> > >
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
> > > at
> > >
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1488)
> > > at
> > >
> >
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.updateIndex(AnalyticsDataIndexer.java:1055)
> > > at
> > >
> >
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexUpdateOpBatches(AnalyticsDataIndexer.java:370)
> > > at
> > >
> >
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexUpdateOperations(AnalyticsDataIndexer.java:408)
> > > at
> > >
> >
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:421)
> > > at
> > >
> >
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$200(AnalyticsDataIndexer.java:115)
> > > at
> > >
> >
> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1731)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > at java.lang.Thread.run(Thread.java:722)
> > >
> > >
> > > Any clue?
> > >
> > > --
> > > Thanks,
> > > Gimantha
> > >
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Migrating from Lucene 4.10.3 to Lucene 5.10

2015-05-21 Thread Gimantha Bandara
Hi all,
I was going to through https://lucene.apache.org/core/5_1_0/MIGRATE.html
It is said that the Directory and LockFactory are now refactored.
We have implemented custom directory implementation with
SingleInstanceLockFactory. In Lucene 4.10.3 we have clearLock method. But I
dont find it in 5.1.0. How does Lucene 5.1.0 handle releasing the lock?

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Migrating from Lucene 4.10.3 to Lucene 5.10

2015-05-22 Thread Gimantha Bandara
Hi Uwe,

Yes we have completely custom directory implementation, but anyway we have
not hardcoded the lock factory. But simply we are passing
SingleInstanceLockFactory with the constructor. I have implemented the
makeLock. I see some additional methods like renameFile..etc. I ll update
you once we implement all necessary methods. Thanks a lot for your
explanation!



On Thu, May 21, 2015 at 10:50 PM, Uwe Schindler  wrote:

> Hi,
>
> LockFactories are singletons in Lucene. Basically a directory does not
> even need a LockFactory, the LockFactories are just there to allow
> "configuring" it in FSDirectory subclasses. The abstract BaseDirectory
> class handles this for you, as it delegates all calls to the Directory to a
> given lock factory (this method is final). In your lock factory (ideally
> also a singleton) the method makeLock is where the lock instance should be
> created and returned based on the directory instance passed in. This lock
> instance is responsible to actually lock the directory and also to release
> the lock. There is no need to forcefully unlock, if you make sure that your
> code calls Lock#release() in a finally block.
>
> If you have a completely custom directory implementation with a
> hard-coded, non configureable locking mechanism, it makes no sense to
> extend BaseDirectory, just extend Directory and implement the abstract
> method makeLock(). This method is responsible for creating the lock
> instance that handles actual locking und unlocking. Alternatively you can
> just hardcode SingleInstanceLockFactory (like RAMDirectory does).
>
> clearLock is no longer existent, because it was never used in Lucene. It
> was there to forcefully remove a lock, which is a bad idea. The only
> available method is, as said before, Directory#makeLock that should return
> a Lock instance.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Gimantha Bandara [mailto:giman...@wso2.com]
> > Sent: Thursday, May 21, 2015 3:12 PM
> > To: java-user@lucene.apache.org
> > Subject: Migrating from Lucene 4.10.3 to Lucene 5.10
> >
> > Hi all,
> > I was going to through
> > https://lucene.apache.org/core/5_1_0/MIGRATE.html
> > It is said that the Directory and LockFactory are now refactored.
> > We have implemented custom directory implementation with
> > SingleInstanceLockFactory. In Lucene 4.10.3 we have clearLock method.
> But I
> > dont find it in 5.1.0. How does Lucene 5.1.0 handle releasing the lock?
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Exception While searching through indices.

2015-06-12 Thread Gimantha Bandara
Hi all,

We are using Lucene 4.10.3 for indexing. Recently we changed our
implementation so that we give data batchwise to lucene to index. Earlier
we just query all the  data from the data source and index all data at
once. It works well. But the number of entries can be up to billions. So
getting all the data entries from the data source causes OutOfMemory
sometimes. So we changed the implementation to So that Lucene indexes the
data batchwise. Now we are getting the following exception. Can anyone tell
me what that exception means?

java.lang.ArrayIndexOutOfBoundsException: 147
at
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(Lucene41PostingsReader.java:538)
at org.apache.lucene.search.TermScorer.advance(TermScorer.java:85)
at
org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:82)
at
org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at
org.apache.lucene.facet.FacetsCollector.doSearch(FacetsCollector.java:294)
at
org.apache.lucene.facet.FacetsCollector.search(FacetsCollector.java:198)


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Exception While searching through indices.

2015-06-14 Thread Gimantha Bandara
Hi Dat,

I can reproduce this behavior even with like 5 records. Is what you
said the only reason that make this exception occur?

Thanks,

On Sat, Jun 13, 2015 at 5:40 AM, Đạt Cao Mạnh 
wrote:

> Hi, the total number of documents in an index of lucene is
> Integer.MAX_VALUE. So using a single lucene index to index billions
> documents is not a proper ways. You should consider using Solr Cloud or
> Elasticsearch to index your documents.
>
> On 19:43, Fri, 12 Jun 2015 Gimantha Bandara  wrote:
>
> > Hi all,
> >
> > We are using Lucene 4.10.3 for indexing. Recently we changed our
> > implementation so that we give data batchwise to lucene to index. Earlier
> > we just query all the  data from the data source and index all data at
> > once. It works well. But the number of entries can be up to billions. So
> > getting all the data entries from the data source causes OutOfMemory
> > sometimes. So we changed the implementation to So that Lucene indexes the
> > data batchwise. Now we are getting the following exception. Can anyone
> tell
> > me what that exception means?
> >
> > java.lang.ArrayIndexOutOfBoundsException: 147
> > at
> >
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(Lucene41PostingsReader.java:538)
> > at org.apache.lucene.search.TermScorer.advance(TermScorer.java:85)
> > at
> >
> >
> org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:82)
> > at
> >
> >
> org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100)
> > at
> >
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
> > at
> > org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
> > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
> > at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
> > at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
> > at
> >
> org.apache.lucene.facet.FacetsCollector.doSearch(FacetsCollector.java:294)
> > at
> > org.apache.lucene.facet.FacetsCollector.search(FacetsCollector.java:198)
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Exception While searching through indices.

2015-06-16 Thread Gimantha Bandara
Hi Dat,

We have an entity called 'record' which contain a record id, a table name
and a set of values. When we insert records to our data layer, we index
those records by the id and the values. Indexing is done in a separate
thread. I ll explain how this done. When we insert records to data layer,
we insert records as blobs into underlying data source (if it is rdbms it
will be blobs) and also we insert another record to another table (that
index-record  contains all the record ids which need to be indexed). The
separate thread which performs the indexing task, extract the so called
indexing record and extract all the record ids in it, which need to be
indexed. There can be several index-records also. What we did earlier was,
extract all the index-records currently we have then extract the record ids
in them and index them using lucene. We had performance tests, unit tests
they all passed. Then we changed our implementation to use iterators to
extract these records since keeping all the records in a List can cause OOM
issues. Now the tests are passing except facets indexing.
I know it will not be easy to understand the context of the problem. I have
mentioned our source at [1]. When we used the method at line number 312
instead of the method at line number 330, we get the above error. Note that
method is used at line number 422.


[1]
https://github.com/gimantha/carbon-analytics/blob/master/components/analytics-core/org.wso2.carbon.analytics.dataservice/src/main/java/org/wso2/carbon/analytics/dataservice/indexing/AnalyticsDataIndexer.java

On Sun, Jun 14, 2015 at 7:13 PM, Đạt Cao Mạnh 
wrote:

> Can you post you scenario in detail along with your modification please?
>
> On 14:09, Sun, 14 Jun 2015 Gimantha Bandara  wrote:
>
>> Hi Dat,
>>
>> I can reproduce this behavior even with like 5 records. Is what you
>> said the only reason that make this exception occur?
>>
>> Thanks,
>>
>> On Sat, Jun 13, 2015 at 5:40 AM, Đạt Cao Mạnh 
>> wrote:
>>
>>> Hi, the total number of documents in an index of lucene is
>>> Integer.MAX_VALUE. So using a single lucene index to index billions
>>> documents is not a proper ways. You should consider using Solr Cloud or
>>> Elasticsearch to index your documents.
>>>
>>> On 19:43, Fri, 12 Jun 2015 Gimantha Bandara  wrote:
>>>
>>> > Hi all,
>>> >
>>> > We are using Lucene 4.10.3 for indexing. Recently we changed our
>>> > implementation so that we give data batchwise to lucene to index.
>>> Earlier
>>> > we just query all the  data from the data source and index all data at
>>> > once. It works well. But the number of entries can be up to billions.
>>> So
>>> > getting all the data entries from the data source causes OutOfMemory
>>> > sometimes. So we changed the implementation to So that Lucene indexes
>>> the
>>> > data batchwise. Now we are getting the following exception. Can anyone
>>> tell
>>> > me what that exception means?
>>> >
>>> > java.lang.ArrayIndexOutOfBoundsException: 147
>>> > at
>>> >
>>> >
>>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(Lucene41PostingsReader.java:538)
>>> > at org.apache.lucene.search.TermScorer.advance(TermScorer.java:85)
>>> > at
>>> >
>>> >
>>> org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:82)
>>> > at
>>> >
>>> >
>>> org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100)
>>> > at
>>> >
>>> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
>>> > at
>>> >
>>> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
>>> > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
>>> > at
>>> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
>>> > at
>>> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
>>> > at
>>> >
>>> org.apache.lucene.facet.FacetsCollector.doSearch(FacetsCollector.java:294)
>>> > at
>>> >
>>> org.apache.lucene.facet.FacetsCollector.search(FacetsCollector.java:198)
>>> >
>>> >
>>> > --
>>> > Gimantha Bandara
>>> > Software Engineer
>>> > WSO2. Inc : http://wso2.com
>>> > Mobile : +94714961919
>>> >
>>>
>>
>>
>>
>> --
>> Gimantha Bandara
>> Software Engineer
>> WSO2. Inc : http://wso2.com
>> Mobile : +94714961919
>>
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Using lucene queries to search StringFields

2015-06-18 Thread Gimantha Bandara
Hi all,

I have created lucene documents like below.

Document doc = new Document();
doc.add(new TextField("A", "1", Field.Store.YES));
doc.add(new StringField("B", "1 2 3", Field.Store.NO));
doc.add(new TextField("Publish Date", "2010", Field.Store.NO));
indexWriter.addDocument(doc);

doc = new Document();
doc.add(new TextField("A", "2", Field.Store.YES));
doc.add(new StringField("B", "1 2", Field.Store.NO));
doc.add(new TextField("Publish Date", "2010", Field.Store.NO));
indexWriter.addDocument(doc);

doc = new Document();
doc.add(new TextField("A", "3", Field.Store.YES));
doc.add(new StringField("B", "1", Field.Store.NO));
doc.add(new TextField("Publish Date", "2012", Field.Store.NO));
indexWriter.addDocument(doc);

Now I am using the following code to test the StringField behavior.

Query w = null;
try {
w = new QueryParser(null, new WhitespaceAnalyzer()).parse("B:1
2");
} catch (ParseException e) {
e.printStackTrace();
}
TopScoreDocCollector collector = TopScoreDocCollector.create(100,
true);
searcher.search(w, collector);
ScoreDoc[] hits = collector.topDocs(0).scoreDocs;
Document indexDoc;
for (ScoreDoc doc : hits) {
indexDoc = searcher.doc(doc.doc);
System.out.println(indexDoc.get("A"));
}

Above code should print only the second document's 'A' value as it is the
only one where 'B' has value '1 2'. But it returns the 3rd document. So I
tried using double quotation marks for 'B' value as below.

w = new QueryParser(null, new WhitespaceAnalyzer()).parse("\"B:1 2\"");

It gives the following error.

Exception in thread "main" java.lang.IllegalStateException: field "B" was
indexed without position data; cannot run PhraseQuery (term=1)
at
org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)   Is
my searching query wrong? (Note: I am using whitespace analyzer everywhere)

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Using lucene queries to search StringFields

2015-06-18 Thread Gimantha Bandara
Correction..

second time I used the following code to test. Then I got the above
IllegalStateException issue.

w = new QueryParser(null, new WhitespaceAnalyzer()).parse("*B:\"1 2\"*");

not the below one.

w = new QueryParser(null, new WhitespaceAnalyzer()).parse("*\**"B:1 2\"*");

Can someone point out the correct way to query for StringFields?

Thanks,

On Thu, Jun 18, 2015 at 2:12 PM, Gimantha Bandara  wrote:

> Hi all,
>
> I have created lucene documents like below.
>
> Document doc = new Document();
> doc.add(new TextField("A", "1", Field.Store.YES));
> doc.add(new StringField("B", "1 2 3", Field.Store.NO));
> doc.add(new TextField("Publish Date", "2010", Field.Store.NO));
> indexWriter.addDocument(doc);
>
> doc = new Document();
> doc.add(new TextField("A", "2", Field.Store.YES));
> doc.add(new StringField("B", "1 2", Field.Store.NO));
> doc.add(new TextField("Publish Date", "2010", Field.Store.NO));
> indexWriter.addDocument(doc);
>
> doc = new Document();
> doc.add(new TextField("A", "3", Field.Store.YES));
> doc.add(new StringField("B", "1", Field.Store.NO));
> doc.add(new TextField("Publish Date", "2012", Field.Store.NO));
> indexWriter.addDocument(doc);
>
> Now I am using the following code to test the StringField behavior.
>
> Query w = null;
> try {
> w = new QueryParser(null, new WhitespaceAnalyzer()).parse("B:1
> 2");
> } catch (ParseException e) {
> e.printStackTrace();
> }
> TopScoreDocCollector collector = TopScoreDocCollector.create(100,
> true);
> searcher.search(w, collector);
> ScoreDoc[] hits = collector.topDocs(0).scoreDocs;
> Document indexDoc;
> for (ScoreDoc doc : hits) {
> indexDoc = searcher.doc(doc.doc);
> System.out.println(indexDoc.get("A"));
> }
>
> Above code should print only the second document's 'A' value as it is the
> only one where 'B' has value '1 2'. But it returns the 3rd document. So I
> tried using double quotation marks for 'B' value as below.
>
> w = new QueryParser(null, new WhitespaceAnalyzer()).parse("\"B:1 2\"");
>
> It gives the following error.
>
> Exception in thread "main" java.lang.IllegalStateException: field "B" was
> indexed without position data; cannot run PhraseQuery (term=1)
> at
> org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)   Is
> my searching query wrong? (Note: I am using whitespace analyzer everywhere)
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Using lucene queries to search StringFields

2015-06-21 Thread Gimantha Bandara
@Sheng
I am using StandardAnalyzer

@Ahmet
I know using the query object will simply work. But I hae a requirement
where the user insert the whole String and i want to return the doc which
exactly match the given text

On Fri, Jun 19, 2015 at 9:23 PM, Sheng  wrote:

> 1. What is the analyzer are you using for indexing ?
> 2. you cannot fuzzy match field name - that for sure will throw exception
> 3. I would start from a simple, deterministic query object to rule out all
> unlikely possibilities first before resorting to parser to generate that
> for you.
>
>
> On Fri, Jun 19, 2015 at 10:45 AM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > Why don't you create your query with API?
> >
> > Term term = new Term("B", "1 2");
> > Query query = new TermQuery(term);
> >
> > Ahmet
> >
> >
> >
> > On Friday, June 19, 2015 9:31 AM, Gimantha Bandara 
> > wrote:
> > Correction..
> >
> > second time I used the following code to test. Then I got the above
> > IllegalStateException issue.
> >
> > w = new QueryParser(null, new WhitespaceAnalyzer()).parse("*B:\"1 2\"*");
> >
> > not the below one.
> >
> > w = new QueryParser(null, new WhitespaceAnalyzer()).parse("*\**"B:1
> 2\"*");
> >
> > Can someone point out the correct way to query for StringFields?
> >
> > Thanks,
> >
> > On Thu, Jun 18, 2015 at 2:12 PM, Gimantha Bandara 
> > wrote:
> >
> > > Hi all,
> > >
> > > I have created lucene documents like below.
> > >
> > > Document doc = new Document();
> > > doc.add(new TextField("A", "1", Field.Store.YES));
> > > doc.add(new StringField("B", "1 2 3", Field.Store.NO));
> > > doc.add(new TextField("Publish Date", "2010", Field.Store.NO));
> > > indexWriter.addDocument(doc);
> > >
> > > doc = new Document();
> > > doc.add(new TextField("A", "2", Field.Store.YES));
> > > doc.add(new StringField("B", "1 2", Field.Store.NO));
> > > doc.add(new TextField("Publish Date", "2010", Field.Store.NO));
> > > indexWriter.addDocument(doc);
> > >
> > > doc = new Document();
> > > doc.add(new TextField("A", "3", Field.Store.YES));
> > > doc.add(new StringField("B", "1", Field.Store.NO));
> > > doc.add(new TextField("Publish Date", "2012", Field.Store.NO));
> > > indexWriter.addDocument(doc);
> > >
> > > Now I am using the following code to test the StringField behavior.
> > >
> > > Query w = null;
> > > try {
> > > w = new QueryParser(null, new
> > WhitespaceAnalyzer()).parse("B:1
> > > 2");
> > > } catch (ParseException e) {
> > > e.printStackTrace();
> > > }
> > > TopScoreDocCollector collector =
> TopScoreDocCollector.create(100,
> > > true);
> > > searcher.search(w, collector);
> > > ScoreDoc[] hits = collector.topDocs(0).scoreDocs;
> > > Document indexDoc;
> > > for (ScoreDoc doc : hits) {
> > > indexDoc = searcher.doc(doc.doc);
> > > System.out.println(indexDoc.get("A"));
> > > }
> > >
> > > Above code should print only the second document's 'A' value as it is
> the
> > > only one where 'B' has value '1 2'. But it returns the 3rd document.
> So I
> > > tried using double quotation marks for 'B' value as below.
> > >
> > > w = new QueryParser(null, new WhitespaceAnalyzer()).parse("\"B:1 2\"");
> > >
> > > It gives the following error.
> > >
> > > Exception in thread "main" java.lang.IllegalStateException: field "B"
> was
> > > indexed without position data; cannot run PhraseQuery (term=1)
> > > at
> > >
> >
> org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
> > > at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
> > > at
> > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
> > > at
> > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
> >  Is
> > > my searching query wrong? (Note: I am using whitespace analyzer
> > everywhere)
> > >
> > > --
> > > Gimantha Bandara
> > > Software Engineer
> > > WSO2. Inc : http://wso2.com
> > > Mobile : +94714961919
> >
> > >
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Document updates work as delete/add under the hood

2015-07-10 Thread Gimantha Bandara
Hi Chalitha,

You can simply use indexWriter.updateDocument to update the existing index
documents

On Fri, Jul 10, 2015 at 11:38 AM, chalitha udara Perera <
chalithaud...@gmail.com> wrote:

> Hi All,
>
> I have a requirement for updating lucene index (add single field for
> existing docs and modify value of another field). These documents contain
> many other fields that do not need any modifications. But as I understand
> luence provides delete/add mechanism for even single field value updates. I
> would really  appreciate if someone can explain me why lucene use these
> delete/add for updates as it feels like a real bottleneck.
>
> Is there any way to do single fields updates without using delete/add ?
>
> Thanks,
> Chalitha
>
> --
> J.M Chalitha Udara Perera
>
> *Department of Computer Science and Engineering,*
> *University of Moratuwa,*
> *Sri Lanka*
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Re: Document updates work as delete/add under the hood

2015-07-10 Thread Gimantha Bandara
ah.. I misread the thread,I thought you were using two APIs to acheive the
same done by updateDocument. Yes it is an overhead and harder for user to
keep track of the fields that he doesn't need to update. Already there is a
Jira opened for this[1].

[1] https://issues.apache.org/jira/browse/LUCENE-4258

On Fri, Jul 10, 2015 at 1:58 PM, chalitha udara Perera <
chalithaud...@gmail.com> wrote:

> Hi Gimatha,
>
> Yes. It is possible to use IndexWriter updateDocument() to update document.
> But with that method what happens under the hood is it deletes matching
> documents and re-index new document. I need to update only a single field.
> Re-indexing a new document with updated field + other fields seems to be
> big overhead. My question is, why lucene does that and currently is there a
> way we can avoid this ?
>
> Thanks,
> Chalitha
>
> On Fri, Jul 10, 2015 at 1:46 PM, Gimantha Bandara 
> wrote:
>
> > Hi Chalitha,
> >
> > You can simply use indexWriter.updateDocument to update the existing
> index
> > documents
> >
> > On Fri, Jul 10, 2015 at 11:38 AM, chalitha udara Perera <
> > chalithaud...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I have a requirement for updating lucene index (add single field for
> > > existing docs and modify value of another field). These documents
> contain
> > > many other fields that do not need any modifications. But as I
> understand
> > > luence provides delete/add mechanism for even single field value
> > updates. I
> > > would really  appreciate if someone can explain me why lucene use these
> > > delete/add for updates as it feels like a real bottleneck.
> > >
> > > Is there any way to do single fields updates without using delete/add ?
> > >
> > > Thanks,
> > > Chalitha
> > >
> > > --
> > > J.M Chalitha Udara Perera
> > >
> > > *Department of Computer Science and Engineering,*
> > > *University of Moratuwa,*
> > > *Sri Lanka*
> > >
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>
>
>
> --
> J.M Chalitha Udara Perera
>
> *Department of Computer Science and Engineering,*
> *University of Moratuwa,*
> *Sri Lanka*
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


GROUP BY in Lucene

2015-08-09 Thread Gimantha Bandara
Hi all,

Is there a way to achieve $subject? For example, consider the following SQL
query.

SELECT A, B, C SUM(D) as E FROM  `table` WHERE time BETWEEN fromDate AND
toDate *GROUP BY X,Y,Z*

In the above query we can group the records by, X,Y,Z. Is there a way to
achieve the same in Lucene? (I guess Faceting would help, But is it
possible get all the categoryPaths along with the matching records? ) Is
there any other way other than using Facets?

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919


Equivalent API in Lucene 5.x.x for ResultMode.setResultMode and ResultMode.setDepth

2015-10-06 Thread Gimantha Bandara
Hi,

I want to get the top categories (all the categories) recursively in one
call. I know that in Lucene 4.x.x we can simply set the ResultMode to
PER_NODE_IN_TREE [1] and set the depth, so we can get the categories
recursively to the level we want in the tree. How do I achieve the same in
Lucene 5.2.1?

[1] http://shaierera.blogspot.com/2012/12/lucene-facets-under-hood.html
Thanks,


Re: Equivalent API in Lucene 5.x.x for ResultMode.setResultMode and ResultMode.setDepth

2015-10-09 Thread Gimantha Bandara
any help on this please?

On 10/6/15, Gimantha Bandara  wrote:
> Hi,
>
> I want to get the top categories (all the categories) recursively in one
> call. I know that in Lucene 4.x.x we can simply set the ResultMode to
> PER_NODE_IN_TREE [1] and set the depth, so we can get the categories
> recursively to the level we want in the tree. How do I achieve the same in
> Lucene 5.2.1?
>
> [1] http://shaierera.blogspot.com/2012/12/lucene-facets-under-hood.html
> Thanks,
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org