Re: docValues usage

2020-11-04 Thread Wei
And in the case of both stored=true and docValues=true,  Solr 8.x shall be
choosing the optimal approach by itself?

On Wed, Nov 4, 2020 at 9:15 AM Wei  wrote:

> Thanks Erick. As indexed is not necessary,  and docValues is more
> efficient than stored fields for function queries, so  we shall go with the
> following:
>
>   3) indexed=false,  stored=false,  docValues=true.
>
> Is my understanding correct?
>
> Best,
> Wei
>
> On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson 
> wrote:
>
>> You don’t need to index the field for function queries, see:
>> https://lucene.apache.org/solr/guide/8_6/docvalues.html.
>>
>> Function queries, as opposed to sorting, faceting and grouping are
>> evaluated at search time where the
>> search process is already parked on the document anyway, so answering the
>> question “for doc X, what
>> is the value of field Y” to compute the score. DocValues are still more
>> efficient I think, although I
>> haven’t measured explicitly...
>>
>> For sorting, faceting and grouping, it’s a much different story. Take
>> sorting. You have to ask
>> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on
>> docX. Doc Z is long gone
>> and getting the value for field Y much more expensive.
>>
>> Also, docValues will not increase memory requirements _unless used_.
>> Otherwise they’ll
>> just sit there on disk. They will certainly increase disk space whether
>> used or not.
>>
>> And _not_ using docValues when you facet, group or sort will also
>> _certainly_ increase
>> your heap requirements since the docValues structure must be built on the
>> heap rather
>> than be in MMapDirectory space.
>>
>> Best,
>> Erick
>>
>>
>> > On Nov 4, 2020, at 5:32 AM, uyilmaz 
>> wrote:
>> >
>> > Hi,
>> >
>> > I'm by no means expert on this so if anyone sees a mistake please
>> correct me.
>> >
>> > I think you need to index this field, since boost functions are added
>> to the query as optional clauses (
>> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
>> It's like boosting a regular field by putting ^2 next to it in a query.
>> Storing or enabling docValues will unnecesarily consume space/memory.
>> >
>> > On Tue, 3 Nov 2020 16:10:50 -0800
>> > Wei  wrote:
>> >
>> >> Hi,
>> >>
>> >> I have a couple of primitive single value numeric type fields,  their
>> >> values are used in boosting functions, but not used in sort/facet. or
>> in
>> >> returned response.   Should I use docValues for them in the schema?  I
>> can
>> >> think of the following options:
>> >>
>> >> 1)   indexed=true,  stored=true, docValues=false
>> >> 2)   indexed=true, stored=false, docValues=true
>> >> 3)   indexed=false,  stored=false,  docValues=true
>> >>
>> >> What would be the performance implications for these options?
>> >>
>> >> Best,
>> >> Wei
>> >
>> >
>> > --
>> > uyilmaz 
>>
>>


Re: docValues usage

2020-11-04 Thread Wei
Thanks Erick. As indexed is not necessary,  and docValues is more efficient
than stored fields for function queries, so  we shall go with the
following:

  3) indexed=false,  stored=false,  docValues=true.

Is my understanding correct?

Best,
Wei

On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson 
wrote:

> You don’t need to index the field for function queries, see:
> https://lucene.apache.org/solr/guide/8_6/docvalues.html.
>
> Function queries, as opposed to sorting, faceting and grouping are
> evaluated at search time where the
> search process is already parked on the document anyway, so answering the
> question “for doc X, what
> is the value of field Y” to compute the score. DocValues are still more
> efficient I think, although I
> haven’t measured explicitly...
>
> For sorting, faceting and grouping, it’s a much different story. Take
> sorting. You have to ask
> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on
> docX. Doc Z is long gone
> and getting the value for field Y much more expensive.
>
> Also, docValues will not increase memory requirements _unless used_.
> Otherwise they’ll
> just sit there on disk. They will certainly increase disk space whether
> used or not.
>
> And _not_ using docValues when you facet, group or sort will also
> _certainly_ increase
> your heap requirements since the docValues structure must be built on the
> heap rather
> than be in MMapDirectory space.
>
> Best,
> Erick
>
>
> > On Nov 4, 2020, at 5:32 AM, uyilmaz  wrote:
> >
> > Hi,
> >
> > I'm by no means expert on this so if anyone sees a mistake please
> correct me.
> >
> > I think you need to index this field, since boost functions are added to
> the query as optional clauses (
> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
> It's like boosting a regular field by putting ^2 next to it in a query.
> Storing or enabling docValues will unnecesarily consume space/memory.
> >
> > On Tue, 3 Nov 2020 16:10:50 -0800
> > Wei  wrote:
> >
> >> Hi,
> >>
> >> I have a couple of primitive single value numeric type fields,  their
> >> values are used in boosting functions, but not used in sort/facet. or in
> >> returned response.   Should I use docValues for them in the schema?  I
> can
> >> think of the following options:
> >>
> >> 1)   indexed=true,  stored=true, docValues=false
> >> 2)   indexed=true, stored=false, docValues=true
> >> 3)   indexed=false,  stored=false,  docValues=true
> >>
> >> What would be the performance implications for these options?
> >>
> >> Best,
> >> Wei
> >
> >
> > --
> > uyilmaz 
>
>


Re: docValues usage

2020-11-04 Thread Erick Erickson
You don’t need to index the field for function queries, see: 
https://lucene.apache.org/solr/guide/8_6/docvalues.html.

Function queries, as opposed to sorting, faceting and grouping are evaluated at 
search time where the  
search process is already parked on the document anyway, so answering the 
question “for doc X, what
is the value of field Y” to compute the score. DocValues are still more 
efficient I think, although I
haven’t measured explicitly...

For sorting, faceting and grouping, it’s a much different story. Take sorting. 
You have to ask
“for field Y, what’s the value in docX and docZ?”. Say you’re parked on docX. 
Doc Z is long gone 
and getting the value for field Y much more expensive.

Also, docValues will not increase memory requirements _unless used_. Otherwise 
they’ll
just sit there on disk. They will certainly increase disk space whether used or 
not.

And _not_ using docValues when you facet, group or sort will also _certainly_ 
increase
your heap requirements since the docValues structure must be built on the heap 
rather
than be in MMapDirectory space.

Best,
Erick


> On Nov 4, 2020, at 5:32 AM, uyilmaz  wrote:
> 
> Hi,
> 
> I'm by no means expert on this so if anyone sees a mistake please correct me.
> 
> I think you need to index this field, since boost functions are added to the 
> query as optional clauses 
> (https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
>  It's like boosting a regular field by putting ^2 next to it in a query. 
> Storing or enabling docValues will unnecesarily consume space/memory.
> 
> On Tue, 3 Nov 2020 16:10:50 -0800
> Wei  wrote:
> 
>> Hi,
>> 
>> I have a couple of primitive single value numeric type fields,  their
>> values are used in boosting functions, but not used in sort/facet. or in
>> returned response.   Should I use docValues for them in the schema?  I can
>> think of the following options:
>> 
>> 1)   indexed=true,  stored=true, docValues=false
>> 2)   indexed=true, stored=false, docValues=true
>> 3)   indexed=false,  stored=false,  docValues=true
>> 
>> What would be the performance implications for these options?
>> 
>> Best,
>> Wei
> 
> 
> -- 
> uyilmaz 



Re: docValues usage

2020-11-04 Thread uyilmaz
Hi,

I'm by no means expert on this so if anyone sees a mistake please correct me.

I think you need to index this field, since boost functions are added to the 
query as optional clauses 
(https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
 It's like boosting a regular field by putting ^2 next to it in a query. 
Storing or enabling docValues will unnecesarily consume space/memory.

On Tue, 3 Nov 2020 16:10:50 -0800
Wei  wrote:

> Hi,
> 
> I have a couple of primitive single value numeric type fields,  their
> values are used in boosting functions, but not used in sort/facet. or in
> returned response.   Should I use docValues for them in the schema?  I can
> think of the following options:
> 
>  1)   indexed=true,  stored=true, docValues=false
>  2)   indexed=true, stored=false, docValues=true
>  3)   indexed=false,  stored=false,  docValues=true
> 
> What would be the performance implications for these options?
> 
> Best,
> Wei


-- 
uyilmaz 


Re: DocValues or stored fields to enable atomic updates

2019-04-05 Thread Emir Arnautović
Hi Andreas,
Stored values are compressed so should take less disk. I am thinking that doc 
values might perform better when it comes to executing atomic update.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Apr 2019, at 12:54, Andreas Hubold  wrote:
> 
> Hi,
> 
> I have a question on schema design: If a single-valued StrField is just used 
> for filtering results by exact value (indexed=true) and its value isn't 
> needed in the search result and not for sorting, faceting or highlighting - 
> should I use docValues=true or stored=true to enable atomic updates? Or even 
> both? I understand that either docValues or stored fields are needed for 
> atomic updates but which of the two would perform better / consume less 
> resources in this scenario?
> 
> Thank you.
> 
> Best regards,
> Andreas
> 
> 
> 



Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-14 Thread Emir Arnautović
Hi Ganesh,
I cannot confirm for sure, but I would assume that it will not get reindexed, 
but just segments doc values file rewritten. It is best if you test this and 
see for yourself.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 14 Feb 2018, at 13:09, mganeshs  wrote:
> 
> Hi Emir,
> 
> Thanks for confirming that strField is not considered / available for in
> place updates. 
> 
> As per documentation, it says...
> 
> *An atomic update operation is performed using this approach only when the
> fields to be updated meet these three conditions:
> 
> are non-indexed (indexed="false"), non-stored (stored="false"), single
> valued (multiValued="false") numeric docValues (docValues="true") fields;
> 
> the _version_ field is also a non-indexed, non-stored single valued
> docValues field; and,
> 
> copy targets of updated fields, if any, are also non-indexed, non-stored
> single valued numeric docValues fields.*
> 
> Let's consider I have declared following three fields in the schema
> 
> id
> 
>  docValues="false"/>
>  docValues="false"/>
>  docValues="true"/>
> 
> With this I am trying to create couple of solr document ( id =1) with only
> Field1 and Field2 and it's also indexed. And I could search the documents
> based on Field1 and Field2
> 
> Now after a while, I am adding a new field called Field3 by passing the id
> field ( id=1) and Field3 ( Field3=100 ( which is docvalues field in our case
> ).
> 
> What will happen now ? Will the complete document gets re indexed or only
> Field3 get added under docValues ?
> 
> Pls confirm.
> 
> Regards,
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-14 Thread mganeshs
Hi Emir,

Thanks for confirming that strField is not considered / available for in
place updates. 

As per documentation, it says...

*An atomic update operation is performed using this approach only when the
fields to be updated meet these three conditions:

are non-indexed (indexed="false"), non-stored (stored="false"), single
valued (multiValued="false") numeric docValues (docValues="true") fields;

the _version_ field is also a non-indexed, non-stored single valued
docValues field; and,

copy targets of updated fields, if any, are also non-indexed, non-stored
single valued numeric docValues fields.*

Let's consider I have declared following three fields in the schema

id





With this I am trying to create couple of solr document ( id =1) with only
Field1 and Field2 and it's also indexed. And I could search the documents
based on Field1 and Field2

Now after a while, I am adding a new field called Field3 by passing the id
field ( id=1) and Field3 ( Field3=100 ( which is docvalues field in our case
).

What will happen now ? Will the complete document gets re indexed or only
Field3 get added under docValues ?

Pls confirm.

Regards,



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-14 Thread Emir Arnautović
Hi Ganesh,
Doc values are enabled for strField and UUID but in place updates are not.

It is not free = according to some discussions on mailing list (did not check 
the code) in place updates are not update of some value in doc values file but 
rewrite of doc values file for the segment that it is holding doc that is 
updated. In case of updating docs that are in larger segment, larger doc values 
file will be rewritten.

Regards,
Emir

--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 14 Feb 2018, at 04:38, mganeshs  wrote:
> 
> Hi,
> 
> Thanks for clearing.
> 
> But as per this  link
> 
>  
> (Enabling DocValues) it says that it supports strField and UUID field also. 
> 
> Again, what you mean by it's not free for large segments. Can you point me
> to some documentation on that ?
> 
> Regards,
> Ganesh
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-13 Thread mganeshs
Hi,

Thanks for clearing.

But as per this  link

 
(Enabling DocValues) it says that it supports strField and UUID field also. 

Again, what you mean by it's not free for large segments. Can you point me
to some documentation on that ?

Regards,
Ganesh



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-13 Thread Emir Arnautović
Hi,
It is clearer now, but you mentioned strings in your first mail and in place 
updates only work for numeric fields. If you meet all conditions, document will 
not be reindexed, but only doc values rewritten for the segment where in place 
update happened. Note that this is not free for large segments.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Feb 2018, at 13:43, mganeshs  wrote:
> 
> Hi,
> 
> I guess my point is not conceived correctly. 
> 
> Here I am talking about the field  "In Place Updates
> 
>  
> "
> 
> As per above link, it says that complete document will not be re-indexed
> during updates, if the field is set as docValues="true" and indexed and
> stored is set as false.
> 
> But I want to know whether complete document will re-index, when I delete a
> field of type "docvalue" is set as true, but indexed and stored is set as
> false. Also when I add new field of type "docvalue"is set as true, but
> indexed and stored is set as false. 
> 
> Hope my question is clear now. 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-13 Thread mganeshs
Hi,

I guess my point is not conceived correctly. 

Here I am talking about the field  "In Place Updates

 
"

As per above link, it says that complete document will not be re-indexed
during updates, if the field is set as docValues="true" and indexed and
stored is set as false.

But I want to know whether complete document will re-index, when I delete a
field of type "docvalue" is set as true, but indexed and stored is set as
false. Also when I add new field of type "docvalue"is set as true, but
indexed and stored is set as false. 

Hope my question is clear now. 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-13 Thread Emir Arnautović
Whenever you send doc to indexing, it is indexed completely and old document 
with the same id (if one exists) is just flagged as deleted and will be removed 
from index when segment that it is stored is merged. In case of large segments, 
it might be never.

The safest option is to do full reindexing on a fresh collection once you 
change schema.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Feb 2018, at 11:34, mganeshs  wrote:
> 
> Hi,
> 
> Thanks for quick response. 
> 
> I forgot to mention that after adding it, I have re-indexed all the data
> with dynamic fields Field_one, Field_two etc. 
> 
> In that case, by adding new field ( docvalue field ) or removing existing
> docvalue field, Will the whole document will re-indexed again, or only this
> field alone will be deleted and added correspondingly.
> 
> Regards,
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-13 Thread mganeshs
Hi,

Thanks for quick response. 

I forgot to mention that after adding it, I have re-indexed all the data
with dynamic fields Field_one, Field_two etc. 

In that case, by adding new field ( docvalue field ) or removing existing
docvalue field, Will the whole document will re-indexed again, or only this
field alone will be deleted and added correspondingly.

Regards,



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-13 Thread Emir Arnautović
Hi,
Changing schema will not do anything by itself. After changes are applied (core 
reloaded if not used API to update schema) it will use new schema to index new 
documents. What matters is what you had in index before schema updates. So if 
you had defined Field_one as string or you had it as number but never used that 
field, deleting it and letting dynamic field handle from that moment should be 
fine. Similar goes with adding new field - if you are adding new definition for 
Field_100 before you used it, it will be ok.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Feb 2018, at 11:03, mganeshs  wrote:
> 
> Hi,
> 
> If I have set following in the schema
> 
>  stored="false" docValues="true"/>
> 
> What will be the impact of deleting a single field, "Fields_one" field or
> what's the impact of adding a new field "Fields_100" ?
> 
> Will the whole document will re-indexed again, or only this field alone will
> be deleted and added correspondingly.
> 
> Idea here is we are trying to avoid complete re-indexing of document ( as
> document would be very huge one and number of documents are also in huge,
> and we have a situation, where we may need to add one new dynamic field to
> all the documents or to remove a dynamic from all the documents ).
> 
> Early responses are really appreciated !
> 
> Regards,
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: docValues with stored and useDocValuesAsStored

2018-01-08 Thread Shalin Shekhar Mangar
Hi Bernd,

If Solr can fetch a field from both stored and docValues then it
chooses docValues only if such field is single-valued and that allows
Solr to avoid accessing the stored document altogether for *all*
fields to be returned. Otherwise stored values are preferred. This is
the behavior since 7.1+

On Mon, Jan 8, 2018 at 2:25 PM, Bernd Fehling
 wrote:
> What is the precedence when docValues with stored=true is used?
> e.g.
>  docValues="true" />
>
> My guess, because of useDocValuesAsStored=true is default, that stored=true is
> ignored and the values are pulled from docValues.
>
> And only if useDocValuesAsStored=false is explicitly used then stored=true 
> comes
> into play.
>
> Or short, useDocValuesAsStored=true (the default) has precedence over 
> stored=true.
> I this right?
>
> Regards
> Bernd



-- 
Regards,
Shalin Shekhar Mangar.


Re: DocValues for multivalued strings and boolean fields

2017-12-21 Thread Shawn Heisey

On 12/20/2017 6:09 PM, S G wrote:

One of our Solr users is trying to set docValues="true" for multivalued
string fields and boolean-type fields.

I am not sure what the performance impact of that would be.
Can docValues negatively affect performance in any way?


Adding to what Emir said:

The docValues data will be the same as stored data, but it will be 
uncompressed, and written in such a way that Lucene can read all values 
for one field simply by reading data off the disk, no computations or 
seeks within the file are required.


If the field is indexed and stored, then docValues will not be accessed 
during normal queries unless there is a sort parameter or a facet 
parameter that mentions a field with docValues.  If present, docValues 
data will be used for sorting and facets, otherwise indexed values will 
be used.  Usually, sorting or facets with docValues uses less memory and 
performs faster than the same operation without docValues.  If the 
machine has insufficient system RAM to effectively cache index data, the 
performance may not improve.


When docValues is added to a field, a complete reindex is required, or 
Solr will not work properly.


If a field that already contains docValues has a change in the setting 
for multiValued, then that will require a reindex, but you must also 
take another step -- completely wiping the index directory before 
reloading or restarting.  If the wipe doesn't happen in this situation, 
then the core is going to completely break and throw exceptions.


Thanks,
Shawn


Re: DocValues for multivalued strings and boolean fields

2017-12-21 Thread Emir Arnautović
Hi SG,
Doc values is another file to write so indexing performances will suffer. In 
theory, query performances will suffer because alternative is in memory 
structure (fieldCache and fieldValueCache). In practice, it will not because in 
memory structure requires larger heap, requires time/resources to build  after 
each commit or on first query and it is likely that doc values’ files will be 
cached by OS so it will not be “disk speed”.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Dec 2017, at 02:09, S G  wrote:
> 
> Hi,
> 
> One of our Solr users is trying to set docValues="true" for multivalued
> string fields and boolean-type fields.
> 
> I am not sure what the performance impact of that would be.
> Can docValues negatively affect performance in any way?
> 
> We are using Solr 6.5.1 and also experimenting with 7.1.0
> 
> Thanks
> SG



Re: docValues

2017-11-25 Thread Erick Erickson
bq: ... using /select handler is the only solution to get data with
fields other than docValues that I visualized.

true, if it's not docValues then you need to do "something else", and
/select works.

bq: to allow the researcher to download the data so he can dive

Ah, I understand now. I suspect the interesting question for you will
be how much information to include. My _guess_ is that you still don't
particularly want to include lots of text, perhaps title, some kind of
reference number or publication information and the like. But as you
say, it'll be a tradeoff.

You probably will _not_ want to return the whole data set if you wind
up using the select handler because if you have, say, 10M rows it'll
assemble the _entire_ 10M docs in a single packet, too large for
practicality. There'll be some kind of paging involved.

Streaming doesn't have that same limitation though.

Best,
Erick

On Fri, Nov 24, 2017 at 11:46 AM, Kojo  wrote:
> Erick,
> thanks for explaining the memory aspects.
>
> Regarding the end user perspective, our intention is to provide a first
> layer of filtering, where data will be rolled up in some buckets and be
> displayed in charts and tables.
> When I told about provide access to "full" documents, it was not to display
> on the web, but to allow the researcher to download the data so he can dive
> into the data with his own tools (R, spss, whatever).
>
> With this in mind, using /select handler is the only solution to get data
> with fields other than docValues that I visualized.
>
> Now that I have a little bit more clear that memory will not be hardly
> affected if I use docValues, I will start to think about disk usage grow
> and how much it impacts the infrastructure.
>
> Thanks again,
>
>
>
>
>
>
>
>
>
> 2017-11-24 16:16 GMT-02:00 Erick Erickson :
>
>> Kojo:
>>
>> bq: My question is, isn´t it to
>> expensive in terms of memory consumption to enable docValues on fields that
>> I dont need to facet, search etc?
>>
>> Well, yes and no. The memory consumed is your OS memory space and a
>> small bit of control structures on your Java heap. It's a bit scary
>> that your _index_ size will increase significantly on disk, but your
>> Java heap requirements won't be correspondingly large.
>>
>> But there's a bigger issue here. Streaming is built to handle very
>> large result sets in a map/reduce style form, i.e. subdivide the work
>> amongst lots of nodes. If you want to return _all_ the records to the
>> user along with description information and the like, what are they
>> going to do with them? 10,000,000 rows (small by some streaming
>> operations standards) is far too many to, say, display in a browser.
>> And it's an anti-pattern to ask for, say, 10,000,000 rows with the
>> select handler.
>>
>> You can page through these results, but it'll take a long time. So
>> basically my question is whether this capability is useful enough to
>> spend time on. If it is and you are going to return lots of rows
>> consider paging through with cursorMark capabilities, see:
>> https://lucidworks.com/2013/12/12/coming-soon-to-solr-
>> efficient-cursor-based-iteration-of-large-result-sets/
>>
>> Best,
>> Erick
>>
>> On Fri, Nov 24, 2017 at 9:38 AM, Kojo  wrote:
>> > I Think that I found the solution. After analysis, change from /export
>> > request handler to /select request handler in order to obtain other
>> fields.
>> > I will try that.
>> >
>> >
>> >
>> > 2017-11-24 15:15 GMT-02:00 Kojo :
>> >
>> >> Thank you very much for your answer, Shawn.
>> >>
>> >> That is it, I was looking for another way to include fields non
>> docValues
>> >> to the filtered result documents.
>> >> I can enable docValues to other fields and reindex all if necessary. I
>> >> will tell you about the use case, because I am not sure  that I am on
>> the
>> >> right track.
>> >>
>> >> As I said before, I am using Streaming Expressions to deal with
>> different
>> >> collections. Up to this moment, it is decided that we will use this
>> >> approach.
>> >>
>> >> The goal is to provide our users a web interface where they can make
>> some
>> >> queries. The backend will get Solr data using the Streaming Expressions
>> >> rest api and will return rolled up data to the frontend, which will
>> display
>> >> some charts and aggregated data.
>> >> After that, the end user may want to have data used to generate this
>> >> aggregated information (not all fields of the filtered documents, but
>> the
>> >> fields used to aggregate information), combined with some other fields
>> >> (title, description of document for example) which are not docValues. As
>> >> you said I need to add docValues to then. My question is, isn´t it to
>> >> expensive in terms of memory consumption to enable docValues on fields
>> that
>> >> I dont need to facet, search etc?
>> >>
>> >> I think that to reconstruct a standard query that achieves the results
>> >> from a 

Re: docValues

2017-11-24 Thread Kojo
Erick,
thanks for explaining the memory aspects.

Regarding the end user perspective, our intention is to provide a first
layer of filtering, where data will be rolled up in some buckets and be
displayed in charts and tables.
When I told about provide access to "full" documents, it was not to display
on the web, but to allow the researcher to download the data so he can dive
into the data with his own tools (R, spss, whatever).

With this in mind, using /select handler is the only solution to get data
with fields other than docValues that I visualized.

Now that I have a little bit more clear that memory will not be hardly
affected if I use docValues, I will start to think about disk usage grow
and how much it impacts the infrastructure.

Thanks again,









2017-11-24 16:16 GMT-02:00 Erick Erickson :

> Kojo:
>
> bq: My question is, isn´t it to
> expensive in terms of memory consumption to enable docValues on fields that
> I dont need to facet, search etc?
>
> Well, yes and no. The memory consumed is your OS memory space and a
> small bit of control structures on your Java heap. It's a bit scary
> that your _index_ size will increase significantly on disk, but your
> Java heap requirements won't be correspondingly large.
>
> But there's a bigger issue here. Streaming is built to handle very
> large result sets in a map/reduce style form, i.e. subdivide the work
> amongst lots of nodes. If you want to return _all_ the records to the
> user along with description information and the like, what are they
> going to do with them? 10,000,000 rows (small by some streaming
> operations standards) is far too many to, say, display in a browser.
> And it's an anti-pattern to ask for, say, 10,000,000 rows with the
> select handler.
>
> You can page through these results, but it'll take a long time. So
> basically my question is whether this capability is useful enough to
> spend time on. If it is and you are going to return lots of rows
> consider paging through with cursorMark capabilities, see:
> https://lucidworks.com/2013/12/12/coming-soon-to-solr-
> efficient-cursor-based-iteration-of-large-result-sets/
>
> Best,
> Erick
>
> On Fri, Nov 24, 2017 at 9:38 AM, Kojo  wrote:
> > I Think that I found the solution. After analysis, change from /export
> > request handler to /select request handler in order to obtain other
> fields.
> > I will try that.
> >
> >
> >
> > 2017-11-24 15:15 GMT-02:00 Kojo :
> >
> >> Thank you very much for your answer, Shawn.
> >>
> >> That is it, I was looking for another way to include fields non
> docValues
> >> to the filtered result documents.
> >> I can enable docValues to other fields and reindex all if necessary. I
> >> will tell you about the use case, because I am not sure  that I am on
> the
> >> right track.
> >>
> >> As I said before, I am using Streaming Expressions to deal with
> different
> >> collections. Up to this moment, it is decided that we will use this
> >> approach.
> >>
> >> The goal is to provide our users a web interface where they can make
> some
> >> queries. The backend will get Solr data using the Streaming Expressions
> >> rest api and will return rolled up data to the frontend, which will
> display
> >> some charts and aggregated data.
> >> After that, the end user may want to have data used to generate this
> >> aggregated information (not all fields of the filtered documents, but
> the
> >> fields used to aggregate information), combined with some other fields
> >> (title, description of document for example) which are not docValues. As
> >> you said I need to add docValues to then. My question is, isn´t it to
> >> expensive in terms of memory consumption to enable docValues on fields
> that
> >> I dont need to facet, search etc?
> >>
> >> I think that to reconstruct a standard query that achieves the results
> >> from a complex Streaming Expression is not simple. This is why I want to
> >> use the same query used to make analysis, to return full data via export
> >> handler.
> >>
> >> I am sorry if this is so much confusing.
> >>
> >> Thank you,
> >>
> >>
> >>
> >>
> >> 2017-11-24 12:36 GMT-02:00 Shawn Heisey :
> >>
> >>> On 11/23/2017 1:51 PM, Kojo wrote:
> >>>
>  I am working on Solr to develop a toll to make analysis. I am using
>  search
>  function of Streaming Expressions, which requires a field to be
> indexed
>  with docValues enabled, so I can get it.
> 
>  Suppose that after someone finishes the analysis, and would like to
> get
>  other fields of the resultset that are not docValues enabled. How can
> it
>  be
>  done?
> 
> >>>
> >>> We did get this message, but it's confusing as to exactly what you're
> >>> asking, which is why nobody responded.
> >>>
> >>> If you're saying that this theoretical person wants to use another
> field
> >>> with the streaming expression analysis you have provided, and that
> field
> >>> 

Re: docValues

2017-11-24 Thread Erick Erickson
Kojo:

bq: My question is, isn´t it to
expensive in terms of memory consumption to enable docValues on fields that
I dont need to facet, search etc?

Well, yes and no. The memory consumed is your OS memory space and a
small bit of control structures on your Java heap. It's a bit scary
that your _index_ size will increase significantly on disk, but your
Java heap requirements won't be correspondingly large.

But there's a bigger issue here. Streaming is built to handle very
large result sets in a map/reduce style form, i.e. subdivide the work
amongst lots of nodes. If you want to return _all_ the records to the
user along with description information and the like, what are they
going to do with them? 10,000,000 rows (small by some streaming
operations standards) is far too many to, say, display in a browser.
And it's an anti-pattern to ask for, say, 10,000,000 rows with the
select handler.

You can page through these results, but it'll take a long time. So
basically my question is whether this capability is useful enough to
spend time on. If it is and you are going to return lots of rows
consider paging through with cursorMark capabilities, see:
https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Best,
Erick

On Fri, Nov 24, 2017 at 9:38 AM, Kojo  wrote:
> I Think that I found the solution. After analysis, change from /export
> request handler to /select request handler in order to obtain other fields.
> I will try that.
>
>
>
> 2017-11-24 15:15 GMT-02:00 Kojo :
>
>> Thank you very much for your answer, Shawn.
>>
>> That is it, I was looking for another way to include fields non docValues
>> to the filtered result documents.
>> I can enable docValues to other fields and reindex all if necessary. I
>> will tell you about the use case, because I am not sure  that I am on the
>> right track.
>>
>> As I said before, I am using Streaming Expressions to deal with different
>> collections. Up to this moment, it is decided that we will use this
>> approach.
>>
>> The goal is to provide our users a web interface where they can make some
>> queries. The backend will get Solr data using the Streaming Expressions
>> rest api and will return rolled up data to the frontend, which will display
>> some charts and aggregated data.
>> After that, the end user may want to have data used to generate this
>> aggregated information (not all fields of the filtered documents, but the
>> fields used to aggregate information), combined with some other fields
>> (title, description of document for example) which are not docValues. As
>> you said I need to add docValues to then. My question is, isn´t it to
>> expensive in terms of memory consumption to enable docValues on fields that
>> I dont need to facet, search etc?
>>
>> I think that to reconstruct a standard query that achieves the results
>> from a complex Streaming Expression is not simple. This is why I want to
>> use the same query used to make analysis, to return full data via export
>> handler.
>>
>> I am sorry if this is so much confusing.
>>
>> Thank you,
>>
>>
>>
>>
>> 2017-11-24 12:36 GMT-02:00 Shawn Heisey :
>>
>>> On 11/23/2017 1:51 PM, Kojo wrote:
>>>
 I am working on Solr to develop a toll to make analysis. I am using
 search
 function of Streaming Expressions, which requires a field to be indexed
 with docValues enabled, so I can get it.

 Suppose that after someone finishes the analysis, and would like to get
 other fields of the resultset that are not docValues enabled. How can it
 be
 done?

>>>
>>> We did get this message, but it's confusing as to exactly what you're
>>> asking, which is why nobody responded.
>>>
>>> If you're saying that this theoretical person wants to use another field
>>> with the streaming expression analysis you have provided, and that field
>>> does not have docValues, then you'll need to add docValues to the field and
>>> completely reindex.
>>>
>>> If you're asking something else, then you're going to need to provide
>>> more details so we can actually know what you want to have happen.
>>>
>>> Thanks,
>>> Shawn
>>>
>>
>>


Re: docValues

2017-11-24 Thread Kojo
I Think that I found the solution. After analysis, change from /export
request handler to /select request handler in order to obtain other fields.
I will try that.



2017-11-24 15:15 GMT-02:00 Kojo :

> Thank you very much for your answer, Shawn.
>
> That is it, I was looking for another way to include fields non docValues
> to the filtered result documents.
> I can enable docValues to other fields and reindex all if necessary. I
> will tell you about the use case, because I am not sure  that I am on the
> right track.
>
> As I said before, I am using Streaming Expressions to deal with different
> collections. Up to this moment, it is decided that we will use this
> approach.
>
> The goal is to provide our users a web interface where they can make some
> queries. The backend will get Solr data using the Streaming Expressions
> rest api and will return rolled up data to the frontend, which will display
> some charts and aggregated data.
> After that, the end user may want to have data used to generate this
> aggregated information (not all fields of the filtered documents, but the
> fields used to aggregate information), combined with some other fields
> (title, description of document for example) which are not docValues. As
> you said I need to add docValues to then. My question is, isn´t it to
> expensive in terms of memory consumption to enable docValues on fields that
> I dont need to facet, search etc?
>
> I think that to reconstruct a standard query that achieves the results
> from a complex Streaming Expression is not simple. This is why I want to
> use the same query used to make analysis, to return full data via export
> handler.
>
> I am sorry if this is so much confusing.
>
> Thank you,
>
>
>
>
> 2017-11-24 12:36 GMT-02:00 Shawn Heisey :
>
>> On 11/23/2017 1:51 PM, Kojo wrote:
>>
>>> I am working on Solr to develop a toll to make analysis. I am using
>>> search
>>> function of Streaming Expressions, which requires a field to be indexed
>>> with docValues enabled, so I can get it.
>>>
>>> Suppose that after someone finishes the analysis, and would like to get
>>> other fields of the resultset that are not docValues enabled. How can it
>>> be
>>> done?
>>>
>>
>> We did get this message, but it's confusing as to exactly what you're
>> asking, which is why nobody responded.
>>
>> If you're saying that this theoretical person wants to use another field
>> with the streaming expression analysis you have provided, and that field
>> does not have docValues, then you'll need to add docValues to the field and
>> completely reindex.
>>
>> If you're asking something else, then you're going to need to provide
>> more details so we can actually know what you want to have happen.
>>
>> Thanks,
>> Shawn
>>
>
>


Re: docValues

2017-11-24 Thread Kojo
Thank you very much for your answer, Shawn.

That is it, I was looking for another way to include fields non docValues
to the filtered result documents.
I can enable docValues to other fields and reindex all if necessary. I will
tell you about the use case, because I am not sure  that I am on the right
track.

As I said before, I am using Streaming Expressions to deal with different
collections. Up to this moment, it is decided that we will use this
approach.

The goal is to provide our users a web interface where they can make some
queries. The backend will get Solr data using the Streaming Expressions
rest api and will return rolled up data to the frontend, which will display
some charts and aggregated data.
After that, the end user may want to have data used to generate this
aggregated information (not all fields of the filtered documents, but the
fields used to aggregate information), combined with some other fields
(title, description of document for example) which are not docValues. As
you said I need to add docValues to then. My question is, isn´t it to
expensive in terms of memory consumption to enable docValues on fields that
I dont need to facet, search etc?

I think that to reconstruct a standard query that achieves the results from
a complex Streaming Expression is not simple. This is why I want to use the
same query used to make analysis, to return full data via export handler.

I am sorry if this is so much confusing.

Thank you,




2017-11-24 12:36 GMT-02:00 Shawn Heisey :

> On 11/23/2017 1:51 PM, Kojo wrote:
>
>> I am working on Solr to develop a toll to make analysis. I am using search
>> function of Streaming Expressions, which requires a field to be indexed
>> with docValues enabled, so I can get it.
>>
>> Suppose that after someone finishes the analysis, and would like to get
>> other fields of the resultset that are not docValues enabled. How can it
>> be
>> done?
>>
>
> We did get this message, but it's confusing as to exactly what you're
> asking, which is why nobody responded.
>
> If you're saying that this theoretical person wants to use another field
> with the streaming expression analysis you have provided, and that field
> does not have docValues, then you'll need to add docValues to the field and
> completely reindex.
>
> If you're asking something else, then you're going to need to provide more
> details so we can actually know what you want to have happen.
>
> Thanks,
> Shawn
>


Re: docValues

2017-11-24 Thread Shawn Heisey

On 11/23/2017 1:51 PM, Kojo wrote:

I am working on Solr to develop a toll to make analysis. I am using search
function of Streaming Expressions, which requires a field to be indexed
with docValues enabled, so I can get it.

Suppose that after someone finishes the analysis, and would like to get
other fields of the resultset that are not docValues enabled. How can it be
done?


We did get this message, but it's confusing as to exactly what you're 
asking, which is why nobody responded.


If you're saying that this theoretical person wants to use another field 
with the streaming expression analysis you have provided, and that field 
does not have docValues, then you'll need to add docValues to the field 
and completely reindex.


If you're asking something else, then you're going to need to provide 
more details so we can actually know what you want to have happen.


Thanks,
Shawn


Re: DocValues

2017-11-17 Thread S G
Thank you Erick and Shawn.


1) So it seems like docValues should always be preferred over stored fields
for retreival
if sorting-of-multivalued-fields is not a concern. Is that a correct
understanding?


2) Also, the in-place atomic updates (with docValues=true and
stored/indexed=false) should
be much faster than regular atomic updates (with docValues=anything and
stored/indexed=true).
This is so because in-place updates are just looking up the document
corresponding to the
field in the columnar-oriented lookup and changing the value there. The
document itself is
not re-indexed because stored is false and indexed is false for an in-place
update. If there is
any bench-mark to verify this, it would be great.


3) If the performance is dreadful to search with docValue=true,
indexed=false fields, then
why is that even allowed? Shouldn't Solr just give an error for such cases?


Thanks
SG




On Fri, Nov 17, 2017 at 6:50 AM, Erick Erickson 
wrote:

> I'll add that using docValues in place of stored is much more
> efficient than using stored. To access stored=true data
> 1> a 16K block must be read from disk
> 2> the 16K block must be decompressed.
>
> With docValues, the value is a simple lookup, the value is probably in
> memory already (MMapped) and the decompression of a large block is
> unnecessary.
>
> There is one caveat: docValues uses (for multiValued fields) a
> SORTED_SET. Therefore multiple identical values are collapsed and the
> values are sorted. So if your input was
> 5, 6, 3, 4, 3, 3, 3
> the retrieved values would be
> 3, 4, 5, 6
>
> If this is NOT ok for your app, then you should use stored values to
> retrieve. Otherwise DocValues is preferred.
>
> Best,
> Erick
>
> On Fri, Nov 17, 2017 at 5:44 AM, Shawn Heisey  wrote:
> > On 11/17/2017 12:53 AM, S G wrote:
> >>
> >> Going through
> >>
> >> https://www.elastic.co/guide/en/elasticsearch/guide/
> current/_deep_dive_on_doc_values.html
> >> ,
> >> is it possible to enable only docValues and disable stored/indexed
> >> attributes for a field?
> >
> >
> > Yes, this is possible.  In fact, if you want to do in-place Atomic
> updates,
> > this is how the field must be set up.
> >
> > https://lucene.apache.org/solr/guide/6_6/updating-parts-
> of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates
> >
> >> In that case, the field will become only sortable/facetable/pivotable
> but
> >> it cannot be searched nor can it be retrieved?
> >
> >
> > Recent Solr versions can use docValues instead of stored when retrieving
> > data for results.  This can be turned on/off on a per-field basis.  The
> > default setting is enabled if you're using a current schema version.
> >
> > https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-
> RetrievingDocValuesDuringSearch
> >
> > As I understand it, you actually *can* search docValues-only fields
> (which
> > would require a match to the entire field -- no text analysis), but
> because
> > it works similarly to a full-table scan in a database, the performance is
> > dreadful on most fields, and it's NOT recommended.
> >
> > Thanks,
> > Shawn
>


Re: DocValues

2017-11-17 Thread Erick Erickson
I'll add that using docValues in place of stored is much more
efficient than using stored. To access stored=true data
1> a 16K block must be read from disk
2> the 16K block must be decompressed.

With docValues, the value is a simple lookup, the value is probably in
memory already (MMapped) and the decompression of a large block is
unnecessary.

There is one caveat: docValues uses (for multiValued fields) a
SORTED_SET. Therefore multiple identical values are collapsed and the
values are sorted. So if your input was
5, 6, 3, 4, 3, 3, 3
the retrieved values would be
3, 4, 5, 6

If this is NOT ok for your app, then you should use stored values to
retrieve. Otherwise DocValues is preferred.

Best,
Erick

On Fri, Nov 17, 2017 at 5:44 AM, Shawn Heisey  wrote:
> On 11/17/2017 12:53 AM, S G wrote:
>>
>> Going through
>>
>> https://www.elastic.co/guide/en/elasticsearch/guide/current/_deep_dive_on_doc_values.html
>> ,
>> is it possible to enable only docValues and disable stored/indexed
>> attributes for a field?
>
>
> Yes, this is possible.  In fact, if you want to do in-place Atomic updates,
> this is how the field must be set up.
>
> https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates
>
>> In that case, the field will become only sortable/facetable/pivotable but
>> it cannot be searched nor can it be retrieved?
>
>
> Recent Solr versions can use docValues instead of stored when retrieving
> data for results.  This can be turned on/off on a per-field basis.  The
> default setting is enabled if you're using a current schema version.
>
> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-RetrievingDocValuesDuringSearch
>
> As I understand it, you actually *can* search docValues-only fields (which
> would require a match to the entire field -- no text analysis), but because
> it works similarly to a full-table scan in a database, the performance is
> dreadful on most fields, and it's NOT recommended.
>
> Thanks,
> Shawn


Re: DocValues

2017-11-17 Thread Shawn Heisey

On 11/17/2017 12:53 AM, S G wrote:

Going through
https://www.elastic.co/guide/en/elasticsearch/guide/current/_deep_dive_on_doc_values.html
,
is it possible to enable only docValues and disable stored/indexed
attributes for a field?


Yes, this is possible.  In fact, if you want to do in-place Atomic 
updates, this is how the field must be set up.


https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates


In that case, the field will become only sortable/facetable/pivotable but
it cannot be searched nor can it be retrieved?


Recent Solr versions can use docValues instead of stored when retrieving 
data for results.  This can be turned on/off on a per-field basis.  The 
default setting is enabled if you're using a current schema version.


https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-RetrievingDocValuesDuringSearch

As I understand it, you actually *can* search docValues-only fields 
(which would require a match to the entire field -- no text analysis), 
but because it works similarly to a full-table scan in a database, the 
performance is dreadful on most fields, and it's NOT recommended.


Thanks,
Shawn


Re: DocValues

2017-11-16 Thread S G
Going through
https://www.elastic.co/guide/en/elasticsearch/guide/current/_deep_dive_on_doc_values.html
,
is it possible to enable only docValues and disable stored/indexed
attributes for a field?

In that case, the field will become only sortable/facetable/pivotable but
it cannot be searched nor can it be retrieved?
I am guessing that stored comes naturally when a field has docValues
enabled.
Is that a correct understanding?

Thanks
SG



On Thu, Nov 16, 2017 at 11:48 PM, S G  wrote:

> Hi,
>
> I am trying to understand docValues.
>
> Almost every link I have gone through says that enable docValues if you
> want to sort/facet/pivot.
> Does that mean I should enable docValues even if I just want to index and
> store simple integer-type fields?
> If that is true, then the default numeric fields will not work for me as
> they have docValues=true.
>
> Is it recommended to create my own fields when I do not want to
> sort/facet/pivot but only want to index and store?
>
> Thanks
> SG
>


Re: DocValues error when upgrading to 6.6.1 from 6.5

2017-10-03 Thread Xie, Sean
I have figured out the problem. The schema was changed and index has been 
deleted and rebuilt since then. But the index files might still contain the old 
stale segments. 

I replayed the situation by restoring the old data using 6.5, then do the 
optimization, then upgrade the 6.6.1, and found out the error is gone.

Thanks
Sean

On 9/22/17, 12:16 AM, "Erick Erickson"  wrote:

This error is not about DocValuesAsStored, but about
multiValued=true|false. It indicates that multiValued is set to
"false" for the current index but "true" in the new schema. At least
that's my guess

Best,
Erick

On Thu, Sep 21, 2017 at 11:56 AM, Xie, Sean  wrote:
> Hi,
>
> When I upgrade the existing SOLR from 6.5.1 to 6.6.1, I’m getting:
> cannot change DocValues type from SORTED to SORTED_SET for field “…..”
>
> During the upgrades, there is no change on schema and schema version (we 
are using schema version 1.5 so seDocValuesAsStored defaults are not taking 
into affect).
>
> Not sure why this is happening.
>
> Planning to upgrade the SOLR version on other clusters, but don’t really 
want to do re-index for all the data.
>
> Any suggestion?
>
> Thanks
> Sean
>
> Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.




Re: DocValues, Long and SolrJ

2017-09-27 Thread Emir Arnautović
I did not look at the code, but after deleting make sure all segments are gone 
(maybe optimize), make sure you reloaded the core and if nothing works (and 
this is the recommended solution) recreate your collection instead of deleting 
all documents. 

HTH,
Emir

> On 26 Sep 2017, at 23:04, Phil Scadden <p.scad...@gns.cri.nz> wrote:
> 
> I get it after I have deleted the index with a delete query and start trying 
> to populate it again with new documents. The error occurs when the indexer 
> tries to add a new document. And yes, I did change the schema before I 
> started the operation.
> 
> -Original Message-
> From: Emir Arnautović [mailto:emir.arnauto...@sematext.com]
> Sent: Tuesday, 26 September 2017 8:49 p.m.
> To: solr-user@lucene.apache.org
> Subject: Re: DocValues, Long and SolrJ
> 
> Hi Phil,
> Are you saying that you get this error when you create fresh core/collection? 
> This sort of errors are usually related to schema being changed after some 
> documents being indexed.
> 
> Thanks,
> Emir
> 
>> On 25 Sep 2017, at 23:42, Phil Scadden <p.scad...@gns.cri.nz> wrote:
>> 
>> I ran into a problem with indexing documents which I worked around by 
>> changing data type, but I am curious as to how the setup could be made to 
>> work.
>> 
>> Solr 6.5.1 - Field type Long, multivalued false, DocValues.
>> 
>> In indexing with Solr, I set the value of field with:
>>   Long accessLevel
>>   ...
>>   accessLevel = qury.val(1);
>>   ...
>>   Document.addField("access", accessLevel);
>> 
>> Solr fails to add the document with this message:
>> 
>> "cannot change DocValues type from SORTED_SET to NUMERIC for field"
>> 
>> ??? So how do you configure a single-valued Long type?
>> Notice: This email and any attachments are confidential and may not be used, 
>> published or redistributed without the prior written consent of the 
>> Institute of Geological and Nuclear Sciences Limited (GNS Science). If 
>> received in error please destroy and immediately notify GNS Science. Do not 
>> copy or disclose the contents.
> 
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.



RE: DocValues, Long and SolrJ

2017-09-26 Thread Phil Scadden
The delete for additions is done with:

   ConcurrentUpdateSolrClient solr = new 
ConcurrentUpdateSolrClient(solrProperties.getServer(),10,2);
   try {
solr.deleteByQuery("*:*");
solr.commit();
   } catch (SolrServerException | IOException ex) {

   }

// start the index rebuild

-Original Message-
From: Phil Scadden [mailto:p.scad...@gns.cri.nz]
Sent: Wednesday, 27 September 2017 10:04 a.m.
To: solr-user@lucene.apache.org
Subject: RE: DocValues, Long and SolrJ

I get it after I have deleted the index with a delete query and start trying to 
populate it again with new documents. The error occurs when the indexer tries 
to add a new document. And yes, I did change the schema before I started the 
operation.

-Original Message-
From: Emir Arnautović [mailto:emir.arnauto...@sematext.com]
Sent: Tuesday, 26 September 2017 8:49 p.m.
To: solr-user@lucene.apache.org
Subject: Re: DocValues, Long and SolrJ

Hi Phil,
Are you saying that you get this error when you create fresh core/collection? 
This sort of errors are usually related to schema being changed after some 
documents being indexed.

Thanks,
Emir

> On 25 Sep 2017, at 23:42, Phil Scadden <p.scad...@gns.cri.nz> wrote:
>
> I ran into a problem with indexing documents which I worked around by 
> changing data type, but I am curious as to how the setup could be made to 
> work.
>
> Solr 6.5.1 - Field type Long, multivalued false, DocValues.
>
> In indexing with Solr, I set the value of field with:
>Long accessLevel
>...
>accessLevel = qury.val(1);
>...
>Document.addField("access", accessLevel);
>
> Solr fails to add the document with this message:
>
> "cannot change DocValues type from SORTED_SET to NUMERIC for field"
>
> ??? So how do you configure a single-valued Long type?
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


RE: DocValues, Long and SolrJ

2017-09-26 Thread Phil Scadden
I get it after I have deleted the index with a delete query and start trying to 
populate it again with new documents. The error occurs when the indexer tries 
to add a new document. And yes, I did change the schema before I started the 
operation.

-Original Message-
From: Emir Arnautović [mailto:emir.arnauto...@sematext.com]
Sent: Tuesday, 26 September 2017 8:49 p.m.
To: solr-user@lucene.apache.org
Subject: Re: DocValues, Long and SolrJ

Hi Phil,
Are you saying that you get this error when you create fresh core/collection? 
This sort of errors are usually related to schema being changed after some 
documents being indexed.

Thanks,
Emir

> On 25 Sep 2017, at 23:42, Phil Scadden <p.scad...@gns.cri.nz> wrote:
>
> I ran into a problem with indexing documents which I worked around by 
> changing data type, but I am curious as to how the setup could be made to 
> work.
>
> Solr 6.5.1 - Field type Long, multivalued false, DocValues.
>
> In indexing with Solr, I set the value of field with:
>Long accessLevel
>...
>accessLevel = qury.val(1);
>...
>Document.addField("access", accessLevel);
>
> Solr fails to add the document with this message:
>
> "cannot change DocValues type from SORTED_SET to NUMERIC for field"
>
> ??? So how do you configure a single-valued Long type?
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: DocValues, Long and SolrJ

2017-09-26 Thread Emir Arnautović
Hi Phil,
Are you saying that you get this error when you create fresh core/collection? 
This sort of errors are usually related to schema being changed after some 
documents being indexed.

Thanks,
Emir

> On 25 Sep 2017, at 23:42, Phil Scadden  wrote:
> 
> I ran into a problem with indexing documents which I worked around by 
> changing data type, but I am curious as to how the setup could be made to 
> work.
> 
> Solr 6.5.1 - Field type Long, multivalued false, DocValues.
> 
> In indexing with Solr, I set the value of field with:
>Long accessLevel
>...
>accessLevel = qury.val(1);
>...
>Document.addField("access", accessLevel);
> 
> Solr fails to add the document with this message:
> 
> "cannot change DocValues type from SORTED_SET to NUMERIC for field"
> 
> ??? So how do you configure a single-valued Long type?
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.



Re: DocValues error when upgrading to 6.6.1 from 6.5

2017-09-21 Thread Erick Erickson
This error is not about DocValuesAsStored, but about
multiValued=true|false. It indicates that multiValued is set to
"false" for the current index but "true" in the new schema. At least
that's my guess

Best,
Erick

On Thu, Sep 21, 2017 at 11:56 AM, Xie, Sean  wrote:
> Hi,
>
> When I upgrade the existing SOLR from 6.5.1 to 6.6.1, I’m getting:
> cannot change DocValues type from SORTED to SORTED_SET for field “…..”
>
> During the upgrades, there is no change on schema and schema version (we are 
> using schema version 1.5 so seDocValuesAsStored defaults are not taking into 
> affect).
>
> Not sure why this is happening.
>
> Planning to upgrade the SOLR version on other clusters, but don’t really want 
> to do re-index for all the data.
>
> Any suggestion?
>
> Thanks
> Sean
>
> Confidentiality Notice::  This email, including attachments, may include 
> non-public, proprietary, confidential or legally privileged information.  If 
> you are not an intended recipient or an authorized agent of an intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of the information contained in or transmitted with this e-mail is 
> unauthorized and strictly prohibited.  If you have received this email in 
> error, please notify the sender by replying to this message and permanently 
> delete this e-mail, its attachments, and any copies of it immediately.  You 
> should not retain, copy or use this e-mail or any attachment for any purpose, 
> nor disclose all or any part of the contents to any other person. Thank you.


Error in Solr 6.6 Example schemas re: DocValues for StrField type must be single-valued?

2017-08-15 Thread Tom Burton-West
Hello,

The comments in the example schema's for Solr 6.6, for state that the
StrField type must be single-valued to support doc values

For example
Solr-6.6.0/server/solr/configsets/basic_configs/conf/managed-schema:

216  

However, on line 221 a StrField is declared with docValues that is
multiValued:
221  

Also note that the comments above say that the field must either be
required or have a default value, but line 221 appears to satisfy neither
condition.

The JavaDocs indicate that StrField can be multi-valued
https://lucene.apache.org/core/6_6_0//core/org/apache/
lucene/index/DocValuesType.html

Is the comment in the example schema file  completely wrong, or is there
some issue with using a docValues with a multivalued StrField?

Tom Burton-West

https://www.hathitrust.org/blogslarge-scale-search


Re: DocValues and facet searches

2017-01-30 Thread alessandro.benedetti
It could be.
Which version of Solr are you using ?
Unfortunately I can not access your responses snippets, is it any exception
in the logs ?

Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DocValues-and-facet-searches-tp4317763p4317814.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: docValues error

2016-02-29 Thread David Santamauro


thanks Shawn, that seems to be the error exactly.

On 02/29/2016 09:22 AM, Shawn Heisey wrote:

On 2/28/2016 3:31 PM, David Santamauro wrote:


I'm porting a 4.8 schema to 5.3 and I came across this new error when
I tried to group.field=f1:

unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED).
Use UninvertingReader or index with docvalues.

f1 is defined as

 
   
 
 
 
   
 

   

Notice that I don't have docValues defined. I realize the field type
doesn't allow docValues so why does this group request fail with a
docValues error? It did work with 4.8

Any clue would be appreciated, thanks


It sounds like you are running into pretty much exactly what I did with 5.x.

https://issues.apache.org/jira/browse/SOLR-8088

I had to create a copyField that's a string (StrField) type and include
docValues on that field.  I still can't use my tokenized field like I
want to, as I do in 4.x.

Thanks,
Shawn



Re: docValues error

2016-02-29 Thread Shawn Heisey
On 2/28/2016 3:31 PM, David Santamauro wrote:
>
> I'm porting a 4.8 schema to 5.3 and I came across this new error when
> I tried to group.field=f1:
>
> unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED).
> Use UninvertingReader or index with docvalues.
>
> f1 is defined as
>
>  positionIncrementGap="100">
>   
> 
> 
> 
>   
> 
>
>required="true" />
>
> Notice that I don't have docValues defined. I realize the field type
> doesn't allow docValues so why does this group request fail with a
> docValues error? It did work with 4.8
>
> Any clue would be appreciated, thanks

It sounds like you are running into pretty much exactly what I did with 5.x.

https://issues.apache.org/jira/browse/SOLR-8088

I had to create a copyField that's a string (StrField) type and include
docValues on that field.  I still can't use my tokenized field like I
want to, as I do in 4.x.

Thanks,
Shawn



Re: docValues error

2016-02-29 Thread David Santamauro



On 02/29/2016 07:59 AM, Tom Evans wrote:

On Mon, Feb 29, 2016 at 11:43 AM, David Santamauro
 wrote:

You will have noticed below, the field definition does not contain
multiValues=true


What version of the schema are you using? In pre 1.1 schemas,
multiValued="true" is the default if it is omitted.


1.5

Other single-value fields (tint, string) group correctly. The move from 
4.8 to 5.3 has rendered grouping on populated, single-value, 
solr.TextField fields crippled -- at least for me.


Re: docValues error

2016-02-29 Thread Tom Evans
On Mon, Feb 29, 2016 at 11:43 AM, David Santamauro
 wrote:
> You will have noticed below, the field definition does not contain
> multiValues=true

What version of the schema are you using? In pre 1.1 schemas,
multiValued="true" is the default if it is omitted.

Cheers

Tom


Re: docValues error

2016-02-29 Thread David Santamauro




On 02/29/2016 06:05 AM, Mikhail Khludnev wrote:

On Mon, Feb 29, 2016 at 12:43 PM, David Santamauro <
david.santama...@gmail.com> wrote:


unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED). Use
UninvertingReader or index with docvalues.


  DocValues is primary citizen api for accessing forward-view index, ie. it
replaced FieldCache. The error is caused by an attempt to group by
multivalue field, which is explicitly claimed as unsupported in the doc.



You will have noticed below, the field definition does not contain 
multiValues=true




On 02/28/2016 05:31 PM, David Santamauro wrote:



f1 is defined as

  

  
  
  

  





Re: docValues error

2016-02-29 Thread Mikhail Khludnev
On Mon, Feb 29, 2016 at 12:43 PM, David Santamauro <
david.santama...@gmail.com> wrote:

>
> So I started over (deleted all documents), re-deployed configs to
> zookeeper and reloaded the collection.
>
> This error still appears when I group.field=f1
>
> unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED). Use
> UninvertingReader or index with docvalues.
>
> What exactly does this error mean and why am I getting it with a field
> that doesn't even have docValues defined?
>
> Why is the DocValues code being used when docValues are not defined
> anywhere in my schema.xml?
>

 DocValues is primary citizen api for accessing forward-view index, ie. it
replaced FieldCache. The error is caused by an attempt to group by
multivalue field, which is explicitly claimed as unsupported in the doc.

Take care!


>
> null:java.lang.IllegalStateException: unexpected docvalues type SORTED_SET
> for field 'f1' (expected=SORTED). Use UninvertingReader or index with
> docvalues.
> at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
> at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
> at
> org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector.doSetNextReader(TermFirstPassGroupingCollector.java:92)
> at
> org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)
> at
> org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:117)
> at
> org.apache.lucene.search.TimeLimitingCollector.getLeafCollector(TimeLimitingCollector.java:144)
> at
> org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:117)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:763)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
> at
> org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:233)
> at
> org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:160)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:398)
>
> etc ...
>
>
>
> On 02/28/2016 05:31 PM, David Santamauro wrote:
>
>>
>> I'm porting a 4.8 schema to 5.3 and I came across this new error when I
>> tried to group.field=f1:
>>
>> unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED).
>> Use UninvertingReader or index with docvalues.
>>
>> f1 is defined as
>>
>>  > positionIncrementGap="100">
>>
>>  
>>  
>>  
>>
>>  
>>
>>> required="true" />
>>
>> Notice that I don't have docValues defined. I realize the field type
>> doesn't allow docValues so why does this group request fail with a
>> docValues error? It did work with 4.8
>>
>> Any clue would be appreciated, thanks
>>
>> David
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: docValues error

2016-02-29 Thread David Santamauro


So I started over (deleted all documents), re-deployed configs to 
zookeeper and reloaded the collection.


This error still appears when I group.field=f1

unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED). 
Use UninvertingReader or index with docvalues.


What exactly does this error mean and why am I getting it with a field 
that doesn't even have docValues defined?


Why is the DocValues code being used when docValues are not defined 
anywhere in my schema.xml?



null:java.lang.IllegalStateException: unexpected docvalues type 
SORTED_SET for field 'f1' (expected=SORTED). Use UninvertingReader or 
index with docvalues.

at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
	at 
org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector.doSetNextReader(TermFirstPassGroupingCollector.java:92)
	at 
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)
	at 
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:117)
	at 
org.apache.lucene.search.TimeLimitingCollector.getLeafCollector(TimeLimitingCollector.java:144)
	at 
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:117)

at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:763)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
	at 
org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:233)
	at 
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:160)
	at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:398)


etc ...



On 02/28/2016 05:31 PM, David Santamauro wrote:


I'm porting a 4.8 schema to 5.3 and I came across this new error when I
tried to group.field=f1:

unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED).
Use UninvertingReader or index with docvalues.

f1 is defined as

 
   
 
 
 
   
 

   

Notice that I don't have docValues defined. I realize the field type
doesn't allow docValues so why does this group request fail with a
docValues error? It did work with 4.8

Any clue would be appreciated, thanks

David


Re: docValues error

2016-02-28 Thread shamik
David, this is tad weird. I've seen this error if you turn on docvalues for
an existing field. You can running an "optimize" on your index and see if it
helps.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/docValues-error-tp4260408p4260455.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DocValues error

2015-11-13 Thread Anshum Gupta
Hi Devansh,

Yes you'd need to reindex your data in order to use DocValues. It's
highlighted here @ the official ref guide :

https://cwiki.apache.org/confluence/display/solr/DocValues

On Fri, Nov 13, 2015 at 10:00 AM, Dhutia, Devansh 
wrote:

> We have an existing collection with a field called lastpublishdate of type
> tdate. It already has a lot of data indexed, and we want to add docValues
> to improve our sorting performance on the field.
>
> The old field definition was:
>
>  
>
> We we recently changed it to
>
>   docValues="true"/>
>
> Is that considered a breaking change? Upon deploying the schema &
> reloading the collection, sorting on the field fails the following error:
>
> unexpected docvalues type NONE for field 'lastpublishdate'
> (expected=NUMERIC). Use UninvertingReader or index with docvalues.
>
> Do we really need to wipe & rebuild the entire index to add docValues to
> an existing dataset?
>
> Thanks
>



-- 
Anshum Gupta


Re: DocValues error

2015-11-13 Thread Dhutia, Devansh
Ugh! I totally missed the highlight. 

Thanks for clarifying. 




On 11/13/15, 1:07 PM, "Anshum Gupta"  wrote:

>Hi Devansh,
>
>Yes you'd need to reindex your data in order to use DocValues. It's
>highlighted here @ the official ref guide :
>
>https://cwiki.apache.org/confluence/display/solr/DocValues
>
>On Fri, Nov 13, 2015 at 10:00 AM, Dhutia, Devansh 
>wrote:
>
>> We have an existing collection with a field called lastpublishdate of type
>> tdate. It already has a lot of data indexed, and we want to add docValues
>> to improve our sorting performance on the field.
>>
>> The old field definition was:
>>
>>  
>>
>> We we recently changed it to
>>
>>  > docValues="true"/>
>>
>> Is that considered a breaking change? Upon deploying the schema &
>> reloading the collection, sorting on the field fails the following error:
>>
>> unexpected docvalues type NONE for field 'lastpublishdate'
>> (expected=NUMERIC). Use UninvertingReader or index with docvalues.
>>
>> Do we really need to wipe & rebuild the entire index to add docValues to
>> an existing dataset?
>>
>> Thanks
>>
>
>
>
>-- 
>Anshum Gupta


Re: docValues

2015-08-09 Thread Yonik Seeley
Interesting... what type of field was this? (string or numeric? single
or multi-valued?)

Without docValues, the first request would be slow (due to building
the in-memory field cache entry), but after that it should be fast.

-Yonik


On Sun, Aug 9, 2015 at 11:31 AM, Nagasharath sharathrayap...@gmail.com wrote:
 I Have tested with docValue and without docValue on the test indexes with a 
 json nested faceting query.

 Have noticed performance boot with the docValue.The response time with Cached 
 items and without cached items is good.

 I have noticed that the response time on the cached items of the index 
 without docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with 
 docValue is always constant( always 20 Ms)

 Decided to go with docValue.

 On 08-Aug-2015, at 10:44 pm, Erick Erickson erickerick...@gmail.com wrote:

 Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues?

 What kind of speedup? How often are you committing? Is there a speed 
 difference
 after a while or on the first few queries?

 Details matter a lot for questions like this.

 Best,
 Erick

 On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath sharathrayap...@gmail.com 
 wrote:
 Good

 Sent from my iPhone

 On 08-Aug-2015, at 8:12 pm, Aman Tandon amantandon...@gmail.com wrote:

 Hi,


 I am seeing a significant difference in the query time after using 
 docValue

 what kind of difference, is it good or bad?

 With Regards
 Aman Tandon

 On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com
 wrote:

 I am seeing a significant difference in the query time after using
 docValue.

 I am curious to know what's happening with 'docValue' included in the
 schema

 On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote:

 On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%

 In my experience, a healthy Java application (after the heap size has
 stabilized) will have a heap utilization graph where the low points are
 between 50 and 75 percent.  If the low points in heap utilization are
 consistently below 25 percent, you would be better off reducing the heap
 size and allowing the OS to use that memory instead.

 If you want to track heap utilization, JVM-Memory in the Solr dashboard
 is a very poor tool.  Use tools like visualvm or jconsole.

 https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 I need to add what I said about very low heap utilization to that wiki
 page.

 Thanks,
 Shawn



Re: docValues

2015-08-09 Thread Nagasharath
I Have tested with docValue and without docValue on the test indexes with a 
json nested faceting query.

Have noticed performance boot with the docValue.The response time with Cached 
items and without cached items is good.

I have noticed that the response time on the cached items of the index without 
docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with docValue 
is always constant( always 20 Ms)

Decided to go with docValue.

 On 08-Aug-2015, at 10:44 pm, Erick Erickson erickerick...@gmail.com wrote:
 
 Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues?
 
 What kind of speedup? How often are you committing? Is there a speed 
 difference
 after a while or on the first few queries?
 
 Details matter a lot for questions like this.
 
 Best,
 Erick
 
 On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath sharathrayap...@gmail.com 
 wrote:
 Good
 
 Sent from my iPhone
 
 On 08-Aug-2015, at 8:12 pm, Aman Tandon amantandon...@gmail.com wrote:
 
 Hi,
 
 
 I am seeing a significant difference in the query time after using docValue
 
 what kind of difference, is it good or bad?
 
 With Regards
 Aman Tandon
 
 On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com
 wrote:
 
 I am seeing a significant difference in the query time after using
 docValue.
 
 I am curious to know what's happening with 'docValue' included in the
 schema
 
 On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote:
 
 On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%
 
 In my experience, a healthy Java application (after the heap size has
 stabilized) will have a heap utilization graph where the low points are
 between 50 and 75 percent.  If the low points in heap utilization are
 consistently below 25 percent, you would be better off reducing the heap
 size and allowing the OS to use that memory instead.
 
 If you want to track heap utilization, JVM-Memory in the Solr dashboard
 is a very poor tool.  Use tools like visualvm or jconsole.
 
 https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
 
 I need to add what I said about very low heap utilization to that wiki
 page.
 
 Thanks,
 Shawn
 


Re: docValues

2015-08-09 Thread Nagasharath
Json Nested faceting on string,string,double fields. Facet function 'sum' is 
applied on double field

Without docValue response for the same query
1) First response without cache 765 Ms
2) second response with cache 28 Ms
3) third response with cache 78 Ms
4) fourth response with cache 94 Ms

With docValue response for the same query
1) first response without cache 78 Ms
2) response is always less than 20 Ms with cache

Version 5.2.1

 On 09-Aug-2015, at 10:39 am, Yonik Seeley ysee...@gmail.com wrote:
 
 Interesting... what type of field was this? (string or numeric? single
 or multi-valued?)
 
 Without docValues, the first request would be slow (due to building
 the in-memory field cache entry), but after that it should be fast.
 
 -Yonik
 
 
 On Sun, Aug 9, 2015 at 11:31 AM, Nagasharath sharathrayap...@gmail.com 
 wrote:
 I Have tested with docValue and without docValue on the test indexes with a 
 json nested faceting query.
 
 Have noticed performance boot with the docValue.The response time with 
 Cached items and without cached items is good.
 
 I have noticed that the response time on the cached items of the index 
 without docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with 
 docValue is always constant( always 20 Ms)
 
 Decided to go with docValue.
 
 On 08-Aug-2015, at 10:44 pm, Erick Erickson erickerick...@gmail.com wrote:
 
 Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues?
 
 What kind of speedup? How often are you committing? Is there a speed 
 difference
 after a while or on the first few queries?
 
 Details matter a lot for questions like this.
 
 Best,
 Erick
 
 On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath sharathrayap...@gmail.com 
 wrote:
 Good
 
 Sent from my iPhone
 
 On 08-Aug-2015, at 8:12 pm, Aman Tandon amantandon...@gmail.com wrote:
 
 Hi,
 
 
 I am seeing a significant difference in the query time after using 
 docValue
 
 what kind of difference, is it good or bad?
 
 With Regards
 Aman Tandon
 
 On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com
 wrote:
 
 I am seeing a significant difference in the query time after using
 docValue.
 
 I am curious to know what's happening with 'docValue' included in the
 schema
 
 On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote:
 
 On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%
 
 In my experience, a healthy Java application (after the heap size has
 stabilized) will have a heap utilization graph where the low points are
 between 50 and 75 percent.  If the low points in heap utilization are
 consistently below 25 percent, you would be better off reducing the heap
 size and allowing the OS to use that memory instead.
 
 If you want to track heap utilization, JVM-Memory in the Solr dashboard
 is a very poor tool.  Use tools like visualvm or jconsole.
 
 https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
 
 I need to add what I said about very low heap utilization to that wiki
 page.
 
 Thanks,
 Shawn
 


Re: docValues

2015-08-08 Thread Nagasharath
I am seeing a significant difference in the query time after using docValue.

I am curious to know what's happening with 'docValue' included in the schema

 On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote:
 
 On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%
 
 In my experience, a healthy Java application (after the heap size has
 stabilized) will have a heap utilization graph where the low points are
 between 50 and 75 percent.  If the low points in heap utilization are
 consistently below 25 percent, you would be better off reducing the heap
 size and allowing the OS to use that memory instead.
 
 If you want to track heap utilization, JVM-Memory in the Solr dashboard
 is a very poor tool.  Use tools like visualvm or jconsole.
 
 https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
 
 I need to add what I said about very low heap utilization to that wiki page.
 
 Thanks,
 Shawn
 


Re: docValues

2015-08-08 Thread Aman Tandon
Hi,


 I am seeing a significant difference in the query time after using docValue

what kind of difference, is it good or bad?

With Regards
Aman Tandon

On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com
wrote:

 I am seeing a significant difference in the query time after using
 docValue.

 I am curious to know what's happening with 'docValue' included in the
 schema

  On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote:
 
  On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
  JVM-Memory has gone up from 3% to 17.1%
 
  In my experience, a healthy Java application (after the heap size has
  stabilized) will have a heap utilization graph where the low points are
  between 50 and 75 percent.  If the low points in heap utilization are
  consistently below 25 percent, you would be better off reducing the heap
  size and allowing the OS to use that memory instead.
 
  If you want to track heap utilization, JVM-Memory in the Solr dashboard
  is a very poor tool.  Use tools like visualvm or jconsole.
 
  https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
 
  I need to add what I said about very low heap utilization to that wiki
 page.
 
  Thanks,
  Shawn
 



Re: docValues

2015-08-08 Thread Nagasharath
Good

Sent from my iPhone

 On 08-Aug-2015, at 8:12 pm, Aman Tandon amantandon...@gmail.com wrote:
 
 Hi,
 
 
 I am seeing a significant difference in the query time after using docValue
 
 what kind of difference, is it good or bad?
 
 With Regards
 Aman Tandon
 
 On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com
 wrote:
 
 I am seeing a significant difference in the query time after using
 docValue.
 
 I am curious to know what's happening with 'docValue' included in the
 schema
 
 On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote:
 
 On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%
 
 In my experience, a healthy Java application (after the heap size has
 stabilized) will have a heap utilization graph where the low points are
 between 50 and 75 percent.  If the low points in heap utilization are
 consistently below 25 percent, you would be better off reducing the heap
 size and allowing the OS to use that memory instead.
 
 If you want to track heap utilization, JVM-Memory in the Solr dashboard
 is a very poor tool.  Use tools like visualvm or jconsole.
 
 https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
 
 I need to add what I said about very low heap utilization to that wiki
 page.
 
 Thanks,
 Shawn
 


Re: docValues

2015-08-08 Thread Erick Erickson
Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues?

What kind of speedup? How often are you committing? Is there a speed difference
after a while or on the first few queries?

Details matter a lot for questions like this.

Best,
Erick

On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath sharathrayap...@gmail.com wrote:
 Good

 Sent from my iPhone

 On 08-Aug-2015, at 8:12 pm, Aman Tandon amantandon...@gmail.com wrote:

 Hi,


 I am seeing a significant difference in the query time after using docValue

 what kind of difference, is it good or bad?

 With Regards
 Aman Tandon

 On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com
 wrote:

 I am seeing a significant difference in the query time after using
 docValue.

 I am curious to know what's happening with 'docValue' included in the
 schema

 On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote:

 On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%

 In my experience, a healthy Java application (after the heap size has
 stabilized) will have a heap utilization graph where the low points are
 between 50 and 75 percent.  If the low points in heap utilization are
 consistently below 25 percent, you would be better off reducing the heap
 size and allowing the OS to use that memory instead.

 If you want to track heap utilization, JVM-Memory in the Solr dashboard
 is a very poor tool.  Use tools like visualvm or jconsole.

 https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 I need to add what I said about very low heap utilization to that wiki
 page.

 Thanks,
 Shawn



Re: docValues

2015-08-07 Thread Shawn Heisey
On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%

In my experience, a healthy Java application (after the heap size has
stabilized) will have a heap utilization graph where the low points are
between 50 and 75 percent.  If the low points in heap utilization are
consistently below 25 percent, you would be better off reducing the heap
size and allowing the OS to use that memory instead.

If you want to track heap utilization, JVM-Memory in the Solr dashboard
is a very poor tool.  Use tools like visualvm or jconsole.

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

I need to add what I said about very low heap utilization to that wiki page.

Thanks,
Shawn



Re: docValues

2015-08-07 Thread Erick Erickson
My crude approximation is that docValues increases the disk size,
but that's mostly due to serializing data that would be built in-memory,
basically serializing these structures. So while they increase
disk size, the memory requirements aren't increased. AFAIK, the
in-memory size is a bit smaller and (I think) some of the data can be
in MMapDirectory space which decreases pressure on the JVM heap,
which is a good thing.

So the measure of whether anything's wrong is really whether your JVM
memory goes up or down afterwards.

I'm surprised by the word exponential here, if you're just measuring with
a few documents the growth rate is probably deceptive.

And be sure to re-index the entire corpus after adding docValues, I usually
remove the entire data directory when I change the schema.

Best,
Erick

On Fri, Aug 7, 2015 at 12:25 PM, naga sharathrayapati
sharathrayap...@gmail.com wrote:
 i have added docValues=true to my existing schema and I have seen
 exponential increase in the size of the index.

 The reason in going with docValues is to improve the faceting query time.

 schema:
 field name=abcdef type=date indexed=true stored=true
 multiValued=false docValues=true/

 Is this ok? am i doing anything wrong in the schema?


Re: docValues

2015-08-07 Thread Shawn Heisey
On 8/7/2015 10:25 AM, naga sharathrayapati wrote:
 i have added docValues=true to my existing schema and I have seen
 exponential increase in the size of the index.

 The reason in going with docValues is to improve the faceting query time.

 schema:
 field name=abcdef type=date indexed=true stored=true
 multiValued=false docValues=true/

 Is this ok? am i doing anything wrong in the schema?

An exponential increase would REALLY surprise me.  I have seen indexes
nearly double in size with the addition of docValues on the primary
search field.  You are storing another complete copy of the original
value sent to Solr.  It will be compressed if your Solr version is at
least 4.2, just like the stored value is in 4.1 and later.

If you are only adding docValues to a date field, I would not expect
that to affect your index size very much at all, but if you are adding
them to large text fields, it would.

Performance tweaks are nearly always a trade -- typically your space and
memory utilization goes up, and it runs faster.  If you don't have
enough memory, then either performance will go down or the system will
stop working entirely.

Thanks,
Shawn



Re: docValues

2015-08-07 Thread naga sharathrayapati
JVM-Memory has gone up from 3% to 17.1%

On Fri, Aug 7, 2015 at 12:10 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 8/7/2015 10:25 AM, naga sharathrayapati wrote:
  i have added docValues=true to my existing schema and I have seen
  exponential increase in the size of the index.
 
  The reason in going with docValues is to improve the faceting query time.
 
  schema:
  field name=abcdef type=date indexed=true stored=true
  multiValued=false docValues=true/
 
  Is this ok? am i doing anything wrong in the schema?

 An exponential increase would REALLY surprise me.  I have seen indexes
 nearly double in size with the addition of docValues on the primary
 search field.  You are storing another complete copy of the original
 value sent to Solr.  It will be compressed if your Solr version is at
 least 4.2, just like the stored value is in 4.1 and later.

 If you are only adding docValues to a date field, I would not expect
 that to affect your index size very much at all, but if you are adding
 them to large text fields, it would.

 Performance tweaks are nearly always a trade -- typically your space and
 memory utilization goes up, and it runs faster.  If you don't have
 enough memory, then either performance will go down or the system will
 stop working entirely.

 Thanks,
 Shawn




Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Alessandro Benedetti
So first of all,
DocValues is a strategy to store on the disk ( or in memory) the
Un-inverted index for the field of interests.
This has been done to SPEED UP the faceting calculus using the fc
algorithm, and improve the memory usage.
It is really weird that this is the cause of a degrading of performances.

Building the DocValues should improve the query time to build facets,
increasing the indexing time.
Are you sure anything else could affect your times ?

let's try to help you out !

2015-07-02 4:19 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Hi,

 I tried to use the docValues to reduce the search time, but when I am using
 the default format for docValues it is taking more time as compared to
 normal faceting technique (without docValues).

 Should I go for Memory format or there is something missing?

 *Note:-* I am doing the indexing at every 10 minutes and I am using solr
 4.8.1

 With Regards
 Aman Tandon




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Aman Tandon
Hi,

I tried to query the without and with docValues, the query with docValues
was taking more time. Does it may be due to IO got involved as some data
will be in some file.

Are you sure anything else could affect your times ?


Yes I am sure. We re-indexed the whole index of 40 Million records, to
implement the docValues to improve the speed. And I somehow managed to do
the simultaneous query for with/without docValues and I am getting higher
time with docValues by approx 200ms. As far as I could see it is increasing
as no of hits are increasing.

*My configuration for docValue is:*

field name=citydv type=string docValues=true stored=true required=
false omitNorms=true multiValued=false /


With Regards
Aman Tandon

On Thu, Jul 2, 2015 at 3:15 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 So first of all,
 DocValues is a strategy to store on the disk ( or in memory) the
 Un-inverted index for the field of interests.
 This has been done to SPEED UP the faceting calculus using the fc
 algorithm, and improve the memory usage.
 It is really weird that this is the cause of a degrading of performances.

 Building the DocValues should improve the query time to build facets,
 increasing the indexing time.
 Are you sure anything else could affect your times ?

 let's try to help you out !

 2015-07-02 4:19 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Hi,
 
  I tried to use the docValues to reduce the search time, but when I am
 using
  the default format for docValues it is taking more time as compared to
  normal faceting technique (without docValues).
 
  Should I go for Memory format or there is something missing?
 
  *Note:-* I am doing the indexing at every 10 minutes and I am using solr
  4.8.1
 
  With Regards
  Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Aman Tandon
Anything wrong?

With Regards
Aman Tandon

On Thu, Jul 2, 2015 at 4:19 PM, Aman Tandon amantandon...@gmail.com wrote:

 Hi,

 I tried to query the without and with docValues, the query with docValues
 was taking more time. Does it may be due to IO got involved as some data
 will be in some file.

 Are you sure anything else could affect your times ?


 Yes I am sure. We re-indexed the whole index of 40 Million records, to
 implement the docValues to improve the speed. And I somehow managed to do
 the simultaneous query for with/without docValues and I am getting higher
 time with docValues by approx 200ms. As far as I could see it is increasing
 as no of hits are increasing.

 *My configuration for docValue is:*

 field name=citydv type=string docValues=true stored=true required
 =false omitNorms=true multiValued=false /


 With Regards
 Aman Tandon

 On Thu, Jul 2, 2015 at 3:15 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

 So first of all,
 DocValues is a strategy to store on the disk ( or in memory) the
 Un-inverted index for the field of interests.
 This has been done to SPEED UP the faceting calculus using the fc
 algorithm, and improve the memory usage.
 It is really weird that this is the cause of a degrading of performances.

 Building the DocValues should improve the query time to build facets,
 increasing the indexing time.
 Are you sure anything else could affect your times ?

 let's try to help you out !

 2015-07-02 4:19 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Hi,
 
  I tried to use the docValues to reduce the search time, but when I am
 using
  the default format for docValues it is taking more time as compared to
  normal faceting technique (without docValues).
 
  Should I go for Memory format or there is something missing?
 
  *Note:-* I am doing the indexing at every 10 minutes and I am using solr
  4.8.1
 
  With Regards
  Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England





Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Toke Eskildsen
Alessandro Benedetti benedetti.ale...@gmail.com wrote:
 DocValues is a strategy to store on the disk ( or in memory) the
 Un-inverted index for the field of interests.

True.

 This has been done to SPEED UP the faceting calculus using the fc
 algorithm, and improve the memory usage.

Part of the reason was to speed up the _startup_ time for faceting.

This is not the first time I read about people getting poorer query-performance 
with DocValues. It does make sense: DocValues in the index means that they 
compete with other files for disk caching and even when they are fully cached, 
the UnInverted structure has a speed edge due to being directly accessible as 
standard on-heap memory structures.

The difference is likely to vary a great deal depending on concrete corpus  
hardware.

- Toke Eskildsen


Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Erick Erickson
How are you testing? I'd do a couple of things:
1 turn of your queryResultCache (set its size to 0).
2 run multiple queries through something like jmeter
3 insure you've run enough warmup queries to load
 all your fields into memory.

Basically, if this were always the case, I'd expect a
_lot_ of people to be talking about it, I suspect there's
something in your test methodology that's giving you
innacurate results.

On Thu, Jul 2, 2015 at 6:49 AM, Aman Tandon amantandon...@gmail.com wrote:
 Hi,

 I tried to query the without and with docValues, the query with docValues
 was taking more time. Does it may be due to IO got involved as some data
 will be in some file.

 Are you sure anything else could affect your times ?


 Yes I am sure. We re-indexed the whole index of 40 Million records, to
 implement the docValues to improve the speed. And I somehow managed to do
 the simultaneous query for with/without docValues and I am getting higher
 time with docValues by approx 200ms. As far as I could see it is increasing
 as no of hits are increasing.

 *My configuration for docValue is:*

 field name=citydv type=string docValues=true stored=true required=
 false omitNorms=true multiValued=false /


 With Regards
 Aman Tandon

 On Thu, Jul 2, 2015 at 3:15 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

 So first of all,
 DocValues is a strategy to store on the disk ( or in memory) the
 Un-inverted index for the field of interests.
 This has been done to SPEED UP the faceting calculus using the fc
 algorithm, and improve the memory usage.
 It is really weird that this is the cause of a degrading of performances.

 Building the DocValues should improve the query time to build facets,
 increasing the indexing time.
 Are you sure anything else could affect your times ?

 let's try to help you out !

 2015-07-02 4:19 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Hi,
 
  I tried to use the docValues to reduce the search time, but when I am
 using
  the default format for docValues it is taking more time as compared to
  normal faceting technique (without docValues).
 
  Should I go for Memory format or there is something missing?
 
  *Note:-* I am doing the indexing at every 10 minutes and I am using solr
  4.8.1
 
  With Regards
  Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Aman Tandon
So should I use Memory format?

With Regards
Aman Tandon

On Thu, Jul 2, 2015 at 9:20 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Alessandro Benedetti benedetti.ale...@gmail.com wrote:
  DocValues is a strategy to store on the disk ( or in memory) the
  Un-inverted index for the field of interests.

 True.

  This has been done to SPEED UP the faceting calculus using the fc
  algorithm, and improve the memory usage.

 Part of the reason was to speed up the _startup_ time for faceting.

 This is not the first time I read about people getting poorer
 query-performance with DocValues. It does make sense: DocValues in the
 index means that they compete with other files for disk caching and even
 when they are fully cached, the UnInverted structure has a speed edge due
 to being directly accessible as standard on-heap memory structures.

 The difference is likely to vary a great deal depending on concrete corpus
  hardware.

 - Toke Eskildsen



Re: DocValues memory consumption thoughts

2015-06-11 Thread Alessandro Benedetti
m DocValues actually is an un-inverted index that is built as part of
the segment.
This means that it has the same behaviour of the other segments files.
Assuming you are indexing not a compound segment file but a classic multi
filed segment in a NRTCachingDirectory,
The segment is built in memory , and when it reaches the ramBufferSizeMB/
Hard commit it is flushed to the disk.

This means that in my opinion there is no particular observation of memory
degradation in using the DocValues.
I would actually say that using DocValues instead the old FieldCache is
decreasing the memory consumption, as FiedlChaces are completely in memory
( with the expensive un-inverting process)
From Solr wiki :

In Lucene 4.0, a new approach was introduced. DocValue fields are now
column-oriented fields with a document-to-value mapping built at index
time. This approach promises to relieve some of the memory requirements of
the fieldCache and make lookups for faceting, sorting, and grouping much
faster.

I would manage memory more accordingly to the other feature you will use !
Let me know if I satisfied your curiosity!

Cheers

2015-06-11 15:38 GMT+01:00 adfel70 adfe...@gmail.com:

 I am using DocValues and I am wondering how to configure Solr's processes
 java's heap size: does DocValues uses system cache (off heap memory) or
 heap
 memory? should I take  DocValues into consideration when I calculate heap
 parameters (xmx, xmn, xms...)?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/DocValues-memory-consumption-thoughts-tp4211187.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: docValues in solr/lucene 4.8.x

2015-06-05 Thread Shawn Heisey
On 6/4/2015 11:42 PM, pras.venkatesh wrote:
 I see docValues has been there since Lucene 4.0. so can I use docValues with
 my current solr cloud version of 4.8.x 
 
 The reason I am asking is because, I have deployment mechanism and securing
 the index (using Tomcat valve) all built out based on Tomcat which I need
 figure out all the way again with Jetty. 
 
 so thinking if I could use docValues with solr/lucene 4.8.x in order to
 perform sort/facet queries effectively(consuming less heap memory)

Solr 4.8 can do docValues.  To enable the feature on a field, you just
need to change field definition in schema.xml to include docValues=true.

Note that you need to completely reindex.  After you make the change and
restart or reload, sorting and facets will NOT work until the reindex is
done, because when docValues is present in the schema, Solr will try to
use docValues, and that data will not be present unless you reindex.

https://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: docValues in solr/lucene 4.8.x

2015-06-05 Thread Alessandro Benedetti
I would like to add this , to Shawn description :

DocValues are only available for specific field types. The types chosen
 determine the underlying Lucene docValue type that will be used. The
 available Solr field types are:

- StrField and UUIDField.
- If the field is single-valued (i.e., multi-valued is false), Lucene
   will use the SORTED type.
   - If the field is multi-valued, Lucene will use the SORTED_SET type.
- Any Trie* numeric fields and EnumField.
- If the field is single-valued (i.e., multi-valued is false), Lucene
   will use the NUMERIC type.
   - If the field is multi-valued, Lucene will use the SORTED_SET type.

 These Lucene types are related to how the values are sorted and stored.


Keep it in mind when designing your schema .

Furthermore, but I guess it's obvious , always evaluate the cardinality of
the field you want to facet on.
For low cardinality it is not even necessary to build docValues.

The facet.method parameter selects the type of algorithm or method Solr
 should use when faceting a field.

 Setting

 Results

 enum

 Enumerates all terms in a field, calculating the set intersection of
 documents that match the term with documents that match the query. This
 method is recommended for faceting multi-valued fields that have only a few
 distinct values. The average number of values per document does not matter.
 For example, faceting on a field with U.S. States such as Alabama,
 Alaska, ... Wyoming would lead to fifty cached filters which would be
 used over and over again. The filterCache should be large enough to hold
 all the cached filters.

 fc

 Calculates facet counts by iterating over documents that match the query
 and summing the terms that appear in each document. This is currently
 implemented using an UnInvertedField cache if the field either is
 multi-valued or is tokenized (according to FieldType.isTokened()). Each
 document is looked up in the cache to see what terms/values it contains,
 and a tally is incremented for each value. This method is excellent for
 situations where the number of indexed values for the field is high, but
 the number of values per document is low. For multi-valued fields, a hybrid
 approach is used that uses term filters from the filterCache for terms
 that match many documents. The letters fc stand for field cache.

 fcs

 Per-segment field faceting for single-valued string fields. Enable with
 facet.method=fcs and control the number of threads used with the threads local
 parameter. This parameter allows faceting to be faster in the presence of
 rapid index changes.

 The default value is fc (except for fields using the BoolField field
 type) since it tends to use less memory and is faster when a field has many
 unique terms in the index.

 This parameter can be specified on a per-field basis with the syntax of
 f.fieldname.facet.method.


All the info are coming from Solr official wiki.

Cheers


2015-06-05 7:15 GMT+01:00 Shawn Heisey apa...@elyograg.org:

 On 6/4/2015 11:42 PM, pras.venkatesh wrote:
  I see docValues has been there since Lucene 4.0. so can I use docValues
 with
  my current solr cloud version of 4.8.x
 
  The reason I am asking is because, I have deployment mechanism and
 securing
  the index (using Tomcat valve) all built out based on Tomcat which I need
  figure out all the way again with Jetty.
 
  so thinking if I could use docValues with solr/lucene 4.8.x in order to
  perform sort/facet queries effectively(consuming less heap memory)

 Solr 4.8 can do docValues.  To enable the feature on a field, you just
 need to change field definition in schema.xml to include docValues=true.

 Note that you need to completely reindex.  After you make the change and
 restart or reload, sorting and facets will NOT work until the reindex is
 done, because when docValues is present in the schema, Solr will try to
 use docValues, and that data will not be present unless you reindex.

 https://wiki.apache.org/solr/HowToReindex

 Thanks,
 Shawn




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: docValues: Can we apply synonym

2015-05-30 Thread Upayavira
What I'm suggesting is that you have two fields, one for searching, one
for faceting.

You may find you can't use docValues for your field type, in which case
Solr will just use caches to improve faceting performance.

Upayavira

On Sat, May 30, 2015, at 01:50 AM, Aman Tandon wrote:
 Hi Upayavira,
 
 How the copyField will help in my scenario when I have to add the synonym
 in docValue enable field.
 
 With Regards
 Aman Tandon
 
 On Sat, May 30, 2015 at 1:18 AM, Upayavira u...@odoko.co.uk wrote:
 
  Use copyField to clone the field for faceting purposes.
 
  Upayavira
 
  On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote:
   Hi Erick,
  
   Thanks for suggestion, We are this query parser plugin (
   *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
   synonym. So it does work slower than edismax that's why it is not in
   contrib right? (I am asking this question because we are using for all
   our
   searches to handle 10 multiword ice cube, icecube etc)
  
   *Moreover I thought a solution for this docValue problem*
  
   I need to make city field as *multivalued* and by this I mean i will add
   the synonym (*mumbai, bombay*) as an extra value to that field if
   present.
   Now searching operation will work fine as before.
  
   
*field name=citymumbai/fieldfield name=citybombay/field*
  
  
   The only prob is if we have to remove the 'city alias/synonym facets'
   when
   we are providing results to the clients.
  
   *mumbai, 1000*
  
  
   With Regards
   Aman Tandon
  
   On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com
  
   wrote:
  
Do take time for performance testing with that parser. It can be slow
depending on your
data as I remember. That said it solves the problem it set out to
solve so if it meets
your SLAs, it can be a life-saver.
   
Best,
Erick
   
   
On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 Even if a little bit outdated, that query parser is really really
  cool to
 manage synonyms !
 +1 !

 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Thanks chris.

 Yes we are using it for handling multiword synonym problem.

 With Regards
 Aman Tandon

 On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Again, I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.
 
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:42 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Ok and what synonym processor you is talking about maybe it could
help ?
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Sorry, my bad.   The synonym processor I mention works
  differently.
 It's
   an extension of the EDisMax query processor and doesn't require
field
   level synonym configs.
  
   -Original Message-
   From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
   Sent: Wednesday, May 27, 2015 6:12 PM
   To: solr-user@lucene.apache.org
   Subject: RE: docValues: Can we apply synonym
  
   But the query analysis isn't on a specific field, it is applied
  to
the
   query string.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Wednesday, May 27, 2015 6:08 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Hi Charles,
  
   The problem here is that the docValues works only with
  primitives
data
   type only like String, int, etc So how could we apply synonym on
   primitive data type.
  
   With Regards
   Aman Tandon
  
   On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Is there any reason you cannot apply the synonyms at query
  time?
 Applying synonyms at indexing time has problems, e.g.
  polluting
the
term frequency for synonyms added, preventing distance
  queries,
...
   
Since city names often have multiple terms, e.g. New York, Den
Hague, etc., I would recommend using Nolan Lawson's
SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great,
  less
   filling.
   
   
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
   
We found this to fix synonyms like ny for New York and
  vice
 versa.
Haven't tried it with docValues, tho.
   
-Original Message-
From: Aman Tandon

Re: docValues: Can we apply synonym

2015-05-29 Thread Alessandro Benedetti
Even if a little bit outdated, that query parser is really really cool to
manage synonyms !
+1 !

2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Thanks chris.

 Yes we are using it for handling multiword synonym problem.

 With Regards
 Aman Tandon

 On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Again, I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:42 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Ok and what synonym processor you is talking about maybe it could help ?
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Sorry, my bad.   The synonym processor I mention works differently.
 It's
   an extension of the EDisMax query processor and doesn't require field
   level synonym configs.
  
   -Original Message-
   From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
   Sent: Wednesday, May 27, 2015 6:12 PM
   To: solr-user@lucene.apache.org
   Subject: RE: docValues: Can we apply synonym
  
   But the query analysis isn't on a specific field, it is applied to the
   query string.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Wednesday, May 27, 2015 6:08 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Hi Charles,
  
   The problem here is that the docValues works only with primitives data
   type only like String, int, etc So how could we apply synonym on
   primitive data type.
  
   With Regards
   Aman Tandon
  
   On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Is there any reason you cannot apply the synonyms at query time?
 Applying synonyms at indexing time has problems, e.g. polluting the
term frequency for synonyms added, preventing distance queries, ...
   
Since city names often have multiple terms, e.g. New York, Den
Hague, etc., I would recommend using Nolan Lawson's
SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
   filling.
   
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
   
We found this to fix synonyms like ny for New York and vice
 versa.
Haven't tried it with docValues, tho.
   
-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Tuesday, May 26, 2015 11:15 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
   
Yes it could be :)
   
Anyway thanks for helping.
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:
   
 I should investigate that, as usually synonyms are analysis stage.
 A simple way is to replace the word with all its synonyms (
 including original word), but simply using this kind of processor
 will change the token position and offsets, modifying the actual
 content of the
document .

  I am from Bombay will become  I am from Bombay Mumbai which
 can be annoying.
 So a clever approach must be investigated.

 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Okay So how could I do it with UpdateProcessors?
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   mmm this is different !
   Without any customisation, right now you could :
   - use docValues to provide exact value facets.
   - Than you can use a copy field, with the proper analysis, to
   search
  when a
   user click on a filter !
  
   So you will see in your facets :
   Mumbai(3)
   Bombay(2)
  
   And when clicking you see 5 results.
   A little bit misleading for the users …
  
   On the other hand if you you want to apply the synonyms
   before, the indexing pipeline ( because docValues field can
   not be analysed), I
 think
   you should play with UpdateProcessors.
  
   Cheers
  
   2015-05-26 17:18 GMT+01:00 Aman Tandon 
 amantandon...@gmail.com
  :
  
We are interested in using docValues for better memory
utilization
 and
speed.
   
Currently we are faceting the search results on *city. *In
city we
 have
also added the synonym for cities like mumbai, bombay (These
are
 Indian
cities). So that result of mumbai is also eligible when
somebody will applying filter of bombay on search results.
   
I need this functionality

Re: docValues: Can we apply synonym

2015-05-29 Thread Erick Erickson
Do take time for performance testing with that parser. It can be slow
depending on your
data as I remember. That said it solves the problem it set out to
solve so if it meets
your SLAs, it can be a life-saver.

Best,
Erick


On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 Even if a little bit outdated, that query parser is really really cool to
 manage synonyms !
 +1 !

 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Thanks chris.

 Yes we are using it for handling multiword synonym problem.

 With Regards
 Aman Tandon

 On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Again, I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:42 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Ok and what synonym processor you is talking about maybe it could help ?
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Sorry, my bad.   The synonym processor I mention works differently.
 It's
   an extension of the EDisMax query processor and doesn't require field
   level synonym configs.
  
   -Original Message-
   From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
   Sent: Wednesday, May 27, 2015 6:12 PM
   To: solr-user@lucene.apache.org
   Subject: RE: docValues: Can we apply synonym
  
   But the query analysis isn't on a specific field, it is applied to the
   query string.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Wednesday, May 27, 2015 6:08 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Hi Charles,
  
   The problem here is that the docValues works only with primitives data
   type only like String, int, etc So how could we apply synonym on
   primitive data type.
  
   With Regards
   Aman Tandon
  
   On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Is there any reason you cannot apply the synonyms at query time?
 Applying synonyms at indexing time has problems, e.g. polluting the
term frequency for synonyms added, preventing distance queries, ...
   
Since city names often have multiple terms, e.g. New York, Den
Hague, etc., I would recommend using Nolan Lawson's
SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
   filling.
   
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
   
We found this to fix synonyms like ny for New York and vice
 versa.
Haven't tried it with docValues, tho.
   
-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Tuesday, May 26, 2015 11:15 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
   
Yes it could be :)
   
Anyway thanks for helping.
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:
   
 I should investigate that, as usually synonyms are analysis stage.
 A simple way is to replace the word with all its synonyms (
 including original word), but simply using this kind of processor
 will change the token position and offsets, modifying the actual
 content of the
document .

  I am from Bombay will become  I am from Bombay Mumbai which
 can be annoying.
 So a clever approach must be investigated.

 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Okay So how could I do it with UpdateProcessors?
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   mmm this is different !
   Without any customisation, right now you could :
   - use docValues to provide exact value facets.
   - Than you can use a copy field, with the proper analysis, to
   search
  when a
   user click on a filter !
  
   So you will see in your facets :
   Mumbai(3)
   Bombay(2)
  
   And when clicking you see 5 results.
   A little bit misleading for the users …
  
   On the other hand if you you want to apply the synonyms
   before, the indexing pipeline ( because docValues field can
   not be analysed), I
 think
   you should play with UpdateProcessors.
  
   Cheers
  
   2015-05-26 17:18 GMT+01:00 Aman Tandon 
 amantandon...@gmail.com
  :
  
We are interested in using docValues for better memory
utilization
 and
speed.
   
Currently we are faceting the search

Re: docValues: Can we apply synonym

2015-05-29 Thread Aman Tandon
Hi Upayavira,

How the copyField will help in my scenario when I have to add the synonym
in docValue enable field.

With Regards
Aman Tandon

On Sat, May 30, 2015 at 1:18 AM, Upayavira u...@odoko.co.uk wrote:

 Use copyField to clone the field for faceting purposes.

 Upayavira

 On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote:
  Hi Erick,
 
  Thanks for suggestion, We are this query parser plugin (
  *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
  synonym. So it does work slower than edismax that's why it is not in
  contrib right? (I am asking this question because we are using for all
  our
  searches to handle 10 multiword ice cube, icecube etc)
 
  *Moreover I thought a solution for this docValue problem*
 
  I need to make city field as *multivalued* and by this I mean i will add
  the synonym (*mumbai, bombay*) as an extra value to that field if
  present.
  Now searching operation will work fine as before.
 
  
   *field name=citymumbai/fieldfield name=citybombay/field*
 
 
  The only prob is if we have to remove the 'city alias/synonym facets'
  when
  we are providing results to the clients.
 
  *mumbai, 1000*
 
 
  With Regards
  Aman Tandon
 
  On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
   Do take time for performance testing with that parser. It can be slow
   depending on your
   data as I remember. That said it solves the problem it set out to
   solve so if it meets
   your SLAs, it can be a life-saver.
  
   Best,
   Erick
  
  
   On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
   benedetti.ale...@gmail.com wrote:
Even if a little bit outdated, that query parser is really really
 cool to
manage synonyms !
+1 !
   
2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:
   
Thanks chris.
   
Yes we are using it for handling multiword synonym problem.
   
With Regards
Aman Tandon
   
On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:
   
 Again, I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.


 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Ok and what synonym processor you is talking about maybe it could
   help ?

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Sorry, my bad.   The synonym processor I mention works
 differently.
It's
  an extension of the EDisMax query processor and doesn't require
   field
  level synonym configs.
 
  -Original Message-
  From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
  Sent: Wednesday, May 27, 2015 6:12 PM
  To: solr-user@lucene.apache.org
  Subject: RE: docValues: Can we apply synonym
 
  But the query analysis isn't on a specific field, it is applied
 to
   the
  query string.
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:08 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Hi Charles,
 
  The problem here is that the docValues works only with
 primitives
   data
  type only like String, int, etc So how could we apply synonym on
  primitive data type.
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Is there any reason you cannot apply the synonyms at query
 time?
Applying synonyms at indexing time has problems, e.g.
 polluting
   the
   term frequency for synonyms added, preventing distance
 queries,
   ...
  
   Since city names often have multiple terms, e.g. New York, Den
   Hague, etc., I would recommend using Nolan Lawson's
   SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great,
 less
  filling.
  
  
   http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
  
   We found this to fix synonyms like ny for New York and
 vice
versa.
   Haven't tried it with docValues, tho.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Tuesday, May 26, 2015 11:15 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Yes it could be :)
  
   Anyway thanks for helping.
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
I should investigate

Re: docValues: Can we apply synonym

2015-05-29 Thread Aman Tandon
Hi Erick,

Thanks for suggestion, We are this query parser plugin (
*SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
synonym. So it does work slower than edismax that's why it is not in
contrib right? (I am asking this question because we are using for all our
searches to handle 10 multiword ice cube, icecube etc)

*Moreover I thought a solution for this docValue problem*

I need to make city field as *multivalued* and by this I mean i will add
the synonym (*mumbai, bombay*) as an extra value to that field if present.
Now searching operation will work fine as before.


 *field name=citymumbai/fieldfield name=citybombay/field*


The only prob is if we have to remove the 'city alias/synonym facets' when
we are providing results to the clients.

*mumbai, 1000*


With Regards
Aman Tandon

On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Do take time for performance testing with that parser. It can be slow
 depending on your
 data as I remember. That said it solves the problem it set out to
 solve so if it meets
 your SLAs, it can be a life-saver.

 Best,
 Erick


 On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
  Even if a little bit outdated, that query parser is really really cool to
  manage synonyms !
  +1 !
 
  2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:
 
  Thanks chris.
 
  Yes we are using it for handling multiword synonym problem.
 
  With Regards
  Aman Tandon
 
  On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Again, I would recommend using Nolan Lawson's
   SynonymExpandingExtendedDismaxQParserPlugin.
  
   http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Wednesday, May 27, 2015 6:42 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Ok and what synonym processor you is talking about maybe it could
 help ?
  
   With Regards
   Aman Tandon
  
   On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Sorry, my bad.   The synonym processor I mention works differently.
  It's
an extension of the EDisMax query processor and doesn't require
 field
level synonym configs.
   
-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Wednesday, May 27, 2015 6:12 PM
To: solr-user@lucene.apache.org
Subject: RE: docValues: Can we apply synonym
   
But the query analysis isn't on a specific field, it is applied to
 the
query string.
   
-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Wednesday, May 27, 2015 6:08 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
   
Hi Charles,
   
The problem here is that the docValues works only with primitives
 data
type only like String, int, etc So how could we apply synonym on
primitive data type.
   
With Regards
Aman Tandon
   
On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:
   
 Is there any reason you cannot apply the synonyms at query time?
  Applying synonyms at indexing time has problems, e.g. polluting
 the
 term frequency for synonyms added, preventing distance queries,
 ...

 Since city names often have multiple terms, e.g. New York, Den
 Hague, etc., I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
filling.


 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 We found this to fix synonyms like ny for New York and vice
  versa.
 Haven't tried it with docValues, tho.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Tuesday, May 26, 2015 11:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Yes it could be :)

 Anyway thanks for helping.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  I should investigate that, as usually synonyms are analysis
 stage.
  A simple way is to replace the word with all its synonyms (
  including original word), but simply using this kind of
 processor
  will change the token position and offsets, modifying the actual
  content of the
 document .
 
   I am from Bombay will become  I am from Bombay Mumbai which
  can be annoying.
  So a clever approach must be investigated.
 
  2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com
 :
 
   Okay So how could I do it with UpdateProcessors?
  
   With Regards
   Aman Tandon
  
   On Tue

Re: docValues: Can we apply synonym

2015-05-29 Thread Upayavira
Use copyField to clone the field for faceting purposes.

Upayavira

On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote:
 Hi Erick,
 
 Thanks for suggestion, We are this query parser plugin (
 *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
 synonym. So it does work slower than edismax that's why it is not in
 contrib right? (I am asking this question because we are using for all
 our
 searches to handle 10 multiword ice cube, icecube etc)
 
 *Moreover I thought a solution for this docValue problem*
 
 I need to make city field as *multivalued* and by this I mean i will add
 the synonym (*mumbai, bombay*) as an extra value to that field if
 present.
 Now searching operation will work fine as before.
 
 
  *field name=citymumbai/fieldfield name=citybombay/field*
 
 
 The only prob is if we have to remove the 'city alias/synonym facets'
 when
 we are providing results to the clients.
 
 *mumbai, 1000*
 
 
 With Regards
 Aman Tandon
 
 On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Do take time for performance testing with that parser. It can be slow
  depending on your
  data as I remember. That said it solves the problem it set out to
  solve so if it meets
  your SLAs, it can be a life-saver.
 
  Best,
  Erick
 
 
  On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
  benedetti.ale...@gmail.com wrote:
   Even if a little bit outdated, that query parser is really really cool to
   manage synonyms !
   +1 !
  
   2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:
  
   Thanks chris.
  
   Yes we are using it for handling multiword synonym problem.
  
   With Regards
   Aman Tandon
  
   On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Again, I would recommend using Nolan Lawson's
SynonymExpandingExtendedDismaxQParserPlugin.
   
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
   
-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Wednesday, May 27, 2015 6:42 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
   
Ok and what synonym processor you is talking about maybe it could
  help ?
   
With Regards
Aman Tandon
   
On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:
   
 Sorry, my bad.   The synonym processor I mention works differently.
   It's
 an extension of the EDisMax query processor and doesn't require
  field
 level synonym configs.

 -Original Message-
 From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
 Sent: Wednesday, May 27, 2015 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: RE: docValues: Can we apply synonym

 But the query analysis isn't on a specific field, it is applied to
  the
 query string.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:08 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Hi Charles,

 The problem here is that the docValues works only with primitives
  data
 type only like String, int, etc So how could we apply synonym on
 primitive data type.

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Is there any reason you cannot apply the synonyms at query time?
   Applying synonyms at indexing time has problems, e.g. polluting
  the
  term frequency for synonyms added, preventing distance queries,
  ...
 
  Since city names often have multiple terms, e.g. New York, Den
  Hague, etc., I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
 filling.
 
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  We found this to fix synonyms like ny for New York and vice
   versa.
  Haven't tried it with docValues, tho.
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Tuesday, May 26, 2015 11:15 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Yes it could be :)
 
  Anyway thanks for helping.
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   I should investigate that, as usually synonyms are analysis
  stage.
   A simple way is to replace the word with all its synonyms (
   including original word), but simply using this kind of
  processor
   will change the token position and offsets, modifying the actual
   content of the
  document .
  
I am from Bombay will become  I am from Bombay Mumbai which

RE: docValues: Can we apply synonym

2015-05-28 Thread Reitzel, Charles
Again, I would recommend using Nolan Lawson's 
SynonymExpandingExtendedDismaxQParserPlugin.

http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com] 
Sent: Wednesday, May 27, 2015 6:42 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym

Ok and what synonym processor you is talking about maybe it could help ?

With Regards
Aman Tandon

On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles  
charles.reit...@tiaa-cref.org wrote:

 Sorry, my bad.   The synonym processor I mention works differently.  It's
 an extension of the EDisMax query processor and doesn't require field 
 level synonym configs.

 -Original Message-
 From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
 Sent: Wednesday, May 27, 2015 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: RE: docValues: Can we apply synonym

 But the query analysis isn't on a specific field, it is applied to the 
 query string.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:08 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Hi Charles,

 The problem here is that the docValues works only with primitives data 
 type only like String, int, etc So how could we apply synonym on 
 primitive data type.

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles  
 charles.reit...@tiaa-cref.org wrote:

  Is there any reason you cannot apply the synonyms at query time?
   Applying synonyms at indexing time has problems, e.g. polluting the 
  term frequency for synonyms added, preventing distance queries, ...
 
  Since city names often have multiple terms, e.g. New York, Den 
  Hague, etc., I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
 filling.
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  We found this to fix synonyms like ny for New York and vice versa.
  Haven't tried it with docValues, tho.
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Tuesday, May 26, 2015 11:15 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Yes it could be :)
 
  Anyway thanks for helping.
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti  
  benedetti.ale...@gmail.com wrote:
 
   I should investigate that, as usually synonyms are analysis stage.
   A simple way is to replace the word with all its synonyms ( 
   including original word), but simply using this kind of processor 
   will change the token position and offsets, modifying the actual 
   content of the
  document .
  
I am from Bombay will become  I am from Bombay Mumbai which 
   can be annoying.
   So a clever approach must be investigated.
  
   2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:
  
Okay So how could I do it with UpdateProcessors?
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti  
benedetti.ale...@gmail.com wrote:
   
 mmm this is different !
 Without any customisation, right now you could :
 - use docValues to provide exact value facets.
 - Than you can use a copy field, with the proper analysis, to 
 search
when a
 user click on a filter !

 So you will see in your facets :
 Mumbai(3)
 Bombay(2)

 And when clicking you see 5 results.
 A little bit misleading for the users …

 On the other hand if you you want to apply the synonyms 
 before, the indexing pipeline ( because docValues field can 
 not be analysed), I
   think
 you should play with UpdateProcessors.

 Cheers

 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  We are interested in using docValues for better memory 
  utilization
   and
  speed.
 
  Currently we are faceting the search results on *city. *In 
  city we
   have
  also added the synonym for cities like mumbai, bombay (These 
  are
   Indian
  cities). So that result of mumbai is also eligible when 
  somebody will applying filter of bombay on search results.
 
  I need this functionality to apply with docValues enabled field.
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti  
  benedetti.ale...@gmail.com wrote:
 
   I checked in the Documentation to be sure, but apparently :
  
   DocValues are only available for specific field types. The 
   types
chosen
   determine the underlying Lucene docValue type that will be
 used.
   The
   available Solr field types are:
  
  - StrField and UUIDField.
  - If the field is single-valued (i.e., multi-valued is 
   false),
 Lucene

Re: docValues: Can we apply synonym

2015-05-28 Thread Aman Tandon
Thanks chris.

Yes we are using it for handling multiword synonym problem.

With Regards
Aman Tandon

On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 Again, I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.

 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Ok and what synonym processor you is talking about maybe it could help ?

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Sorry, my bad.   The synonym processor I mention works differently.  It's
  an extension of the EDisMax query processor and doesn't require field
  level synonym configs.
 
  -Original Message-
  From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
  Sent: Wednesday, May 27, 2015 6:12 PM
  To: solr-user@lucene.apache.org
  Subject: RE: docValues: Can we apply synonym
 
  But the query analysis isn't on a specific field, it is applied to the
  query string.
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:08 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Hi Charles,
 
  The problem here is that the docValues works only with primitives data
  type only like String, int, etc So how could we apply synonym on
  primitive data type.
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Is there any reason you cannot apply the synonyms at query time?
Applying synonyms at indexing time has problems, e.g. polluting the
   term frequency for synonyms added, preventing distance queries, ...
  
   Since city names often have multiple terms, e.g. New York, Den
   Hague, etc., I would recommend using Nolan Lawson's
   SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
  filling.
  
   http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
  
   We found this to fix synonyms like ny for New York and vice versa.
   Haven't tried it with docValues, tho.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Tuesday, May 26, 2015 11:15 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Yes it could be :)
  
   Anyway thanks for helping.
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
I should investigate that, as usually synonyms are analysis stage.
A simple way is to replace the word with all its synonyms (
including original word), but simply using this kind of processor
will change the token position and offsets, modifying the actual
content of the
   document .
   
 I am from Bombay will become  I am from Bombay Mumbai which
can be annoying.
So a clever approach must be investigated.
   
2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:
   
 Okay So how could I do it with UpdateProcessors?

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  mmm this is different !
  Without any customisation, right now you could :
  - use docValues to provide exact value facets.
  - Than you can use a copy field, with the proper analysis, to
  search
 when a
  user click on a filter !
 
  So you will see in your facets :
  Mumbai(3)
  Bombay(2)
 
  And when clicking you see 5 results.
  A little bit misleading for the users …
 
  On the other hand if you you want to apply the synonyms
  before, the indexing pipeline ( because docValues field can
  not be analysed), I
think
  you should play with UpdateProcessors.
 
  Cheers
 
  2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com
 :
 
   We are interested in using docValues for better memory
   utilization
and
   speed.
  
   Currently we are faceting the search results on *city. *In
   city we
have
   also added the synonym for cities like mumbai, bombay (These
   are
Indian
   cities). So that result of mumbai is also eligible when
   somebody will applying filter of bombay on search results.
  
   I need this functionality to apply with docValues enabled
 field.
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
I checked in the Documentation to be sure, but apparently :
   
DocValues are only available

Re: docValues: Can we apply synonym

2015-05-27 Thread Aman Tandon
Ok and what synonym processor you is talking about maybe it could help ?

With Regards
Aman Tandon

On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 Sorry, my bad.   The synonym processor I mention works differently.  It's
 an extension of the EDisMax query processor and doesn't require field level
 synonym configs.

 -Original Message-
 From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
 Sent: Wednesday, May 27, 2015 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: RE: docValues: Can we apply synonym

 But the query analysis isn't on a specific field, it is applied to the
 query string.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:08 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Hi Charles,

 The problem here is that the docValues works only with primitives data
 type only like String, int, etc So how could we apply synonym on primitive
 data type.

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Is there any reason you cannot apply the synonyms at query time?
   Applying synonyms at indexing time has problems, e.g. polluting the
  term frequency for synonyms added, preventing distance queries, ...
 
  Since city names often have multiple terms, e.g. New York, Den Hague,
  etc., I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
 filling.
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  We found this to fix synonyms like ny for New York and vice versa.
  Haven't tried it with docValues, tho.
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Tuesday, May 26, 2015 11:15 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Yes it could be :)
 
  Anyway thanks for helping.
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   I should investigate that, as usually synonyms are analysis stage.
   A simple way is to replace the word with all its synonyms (
   including original word), but simply using this kind of processor
   will change the token position and offsets, modifying the actual
   content of the
  document .
  
I am from Bombay will become  I am from Bombay Mumbai which can
   be annoying.
   So a clever approach must be investigated.
  
   2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:
  
Okay So how could I do it with UpdateProcessors?
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:
   
 mmm this is different !
 Without any customisation, right now you could :
 - use docValues to provide exact value facets.
 - Than you can use a copy field, with the proper analysis, to
 search
when a
 user click on a filter !

 So you will see in your facets :
 Mumbai(3)
 Bombay(2)

 And when clicking you see 5 results.
 A little bit misleading for the users …

 On the other hand if you you want to apply the synonyms before,
 the indexing pipeline ( because docValues field can not be
 analysed), I
   think
 you should play with UpdateProcessors.

 Cheers

 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  We are interested in using docValues for better memory
  utilization
   and
  speed.
 
  Currently we are faceting the search results on *city. *In
  city we
   have
  also added the synonym for cities like mumbai, bombay (These
  are
   Indian
  cities). So that result of mumbai is also eligible when
  somebody will applying filter of bombay on search results.
 
  I need this functionality to apply with docValues enabled field.
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   I checked in the Documentation to be sure, but apparently :
  
   DocValues are only available for specific field types. The
   types
chosen
   determine the underlying Lucene docValue type that will be
 used.
   The
   available Solr field types are:
  
  - StrField and UUIDField.
  - If the field is single-valued (i.e., multi-valued is
   false),
 Lucene
 will use the SORTED type.
 - If the field is multi-valued, Lucene will use the
   SORTED_SET
  type.
  - Any Trie* numeric fields and EnumField.
  - If the field is single-valued (i.e., multi-valued is
   false),
 Lucene
 will use the NUMERIC type.
 - If the field is multi-valued, Lucene will use

RE: docValues: Can we apply synonym

2015-05-27 Thread Reitzel, Charles
Is there any reason you cannot apply the synonyms at query time?   Applying 
synonyms at indexing time has problems, e.g. polluting the term frequency for 
synonyms added, preventing distance queries, ...

Since city names often have multiple terms, e.g. New York, Den Hague, etc., I 
would recommend using Nolan Lawson's 
SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less filling.

http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

We found this to fix synonyms like ny for New York and vice versa.  Haven't 
tried it with docValues, tho.

-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com] 
Sent: Tuesday, May 26, 2015 11:15 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym

Yes it could be :)

Anyway thanks for helping.

With Regards
Aman Tandon

On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti  
benedetti.ale...@gmail.com wrote:

 I should investigate that, as usually synonyms are analysis stage.
 A simple way is to replace the word with all its synonyms ( including 
 original word), but simply using this kind of processor will change 
 the token position and offsets, modifying the actual content of the document .

  I am from Bombay will become  I am from Bombay Mumbai which can 
 be annoying.
 So a clever approach must be investigated.

 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Okay So how could I do it with UpdateProcessors?
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti  
  benedetti.ale...@gmail.com wrote:
 
   mmm this is different !
   Without any customisation, right now you could :
   - use docValues to provide exact value facets.
   - Than you can use a copy field, with the proper analysis, to 
   search
  when a
   user click on a filter !
  
   So you will see in your facets :
   Mumbai(3)
   Bombay(2)
  
   And when clicking you see 5 results.
   A little bit misleading for the users …
  
   On the other hand if you you want to apply the synonyms before, 
   the indexing pipeline ( because docValues field can not be 
   analysed), I
 think
   you should play with UpdateProcessors.
  
   Cheers
  
   2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:
  
We are interested in using docValues for better memory 
utilization
 and
speed.
   
Currently we are faceting the search results on *city. *In city 
we
 have
also added the synonym for cities like mumbai, bombay (These are
 Indian
cities). So that result of mumbai is also eligible when somebody 
will applying filter of bombay on search results.
   
I need this functionality to apply with docValues enabled field.
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti  
benedetti.ale...@gmail.com wrote:
   
 I checked in the Documentation to be sure, but apparently :

 DocValues are only available for specific field types. The 
 types
  chosen
 determine the underlying Lucene docValue type that will be used.
 The
 available Solr field types are:

- StrField and UUIDField.
- If the field is single-valued (i.e., multi-valued is 
 false),
   Lucene
   will use the SORTED type.
   - If the field is multi-valued, Lucene will use the
 SORTED_SET
type.
- Any Trie* numeric fields and EnumField.
- If the field is single-valued (i.e., multi-valued is 
 false),
   Lucene
   will use the NUMERIC type.
   - If the field is multi-valued, Lucene will use the
 SORTED_SET
type.


 This means you should not analyse a field where DocValues is
 enabled.
 Can your explain us your use case ? Why are you interested in
  synonyms
 DocValues level ?

 Cheers

 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:

  To my understanding, docValues are just an uninverted index. 
  That
  is,
it
  contains the terms that are generated at the end of an 
  analysis
   chain.
  Therefore, you simply enable docValues and include the 
  SynonymFilterFactory in your analysis.
 
  Is that enough, or are you struggling with some other issue?
 
  Upayavira
 
  On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
   Hi,
  
   We have some field *city* in which the docValues are enabled.
 We
   need
 to
   add the synonym in that field so how could we do it?
  
   With Regards
   Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England

RE: docValues: Can we apply synonym

2015-05-27 Thread Reitzel, Charles
But the query analysis isn't on a specific field, it is applied to the query 
string.

-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com] 
Sent: Wednesday, May 27, 2015 6:08 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym

Hi Charles,

The problem here is that the docValues works only with primitives data type 
only like String, int, etc So how could we apply synonym on primitive data type.

With Regards
Aman Tandon

On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles  
charles.reit...@tiaa-cref.org wrote:

 Is there any reason you cannot apply the synonyms at query time?
  Applying synonyms at indexing time has problems, e.g. polluting the 
 term frequency for synonyms added, preventing distance queries, ...

 Since city names often have multiple terms, e.g. New York, Den Hague, 
 etc., I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less filling.

 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 We found this to fix synonyms like ny for New York and vice versa.
 Haven't tried it with docValues, tho.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Tuesday, May 26, 2015 11:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Yes it could be :)

 Anyway thanks for helping.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti  
 benedetti.ale...@gmail.com wrote:

  I should investigate that, as usually synonyms are analysis stage.
  A simple way is to replace the word with all its synonyms ( 
  including original word), but simply using this kind of processor 
  will change the token position and offsets, modifying the actual 
  content of the
 document .
 
   I am from Bombay will become  I am from Bombay Mumbai which can 
  be annoying.
  So a clever approach must be investigated.
 
  2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:
 
   Okay So how could I do it with UpdateProcessors?
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti  
   benedetti.ale...@gmail.com wrote:
  
mmm this is different !
Without any customisation, right now you could :
- use docValues to provide exact value facets.
- Than you can use a copy field, with the proper analysis, to 
search
   when a
user click on a filter !
   
So you will see in your facets :
Mumbai(3)
Bombay(2)
   
And when clicking you see 5 results.
A little bit misleading for the users …
   
On the other hand if you you want to apply the synonyms before, 
the indexing pipeline ( because docValues field can not be 
analysed), I
  think
you should play with UpdateProcessors.
   
Cheers
   
2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:
   
 We are interested in using docValues for better memory 
 utilization
  and
 speed.

 Currently we are faceting the search results on *city. *In 
 city we
  have
 also added the synonym for cities like mumbai, bombay (These 
 are
  Indian
 cities). So that result of mumbai is also eligible when 
 somebody will applying filter of bombay on search results.

 I need this functionality to apply with docValues enabled field.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti  
 benedetti.ale...@gmail.com wrote:

  I checked in the Documentation to be sure, but apparently :
 
  DocValues are only available for specific field types. The 
  types
   chosen
  determine the underlying Lucene docValue type that will be used.
  The
  available Solr field types are:
 
 - StrField and UUIDField.
 - If the field is single-valued (i.e., multi-valued is 
  false),
Lucene
will use the SORTED type.
- If the field is multi-valued, Lucene will use the
  SORTED_SET
 type.
 - Any Trie* numeric fields and EnumField.
 - If the field is single-valued (i.e., multi-valued is 
  false),
Lucene
will use the NUMERIC type.
- If the field is multi-valued, Lucene will use the
  SORTED_SET
 type.
 
 
  This means you should not analyse a field where DocValues is
  enabled.
  Can your explain us your use case ? Why are you interested 
  in
   synonyms
  DocValues level ?
 
  Cheers
 
  2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:
 
   To my understanding, docValues are just an uninverted index.
   That
   is,
 it
   contains the terms that are generated at the end of an 
   analysis
chain.
   Therefore, you simply enable docValues and include the 
   SynonymFilterFactory in your analysis.
  
   Is that enough, or are you struggling with some other issue

Re: docValues: Can we apply synonym

2015-05-27 Thread Aman Tandon
Hi Charles,

The problem here is that the docValues works only with primitives data type
only like String, int, etc So how could we apply synonym on primitive data
type.

With Regards
Aman Tandon

On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 Is there any reason you cannot apply the synonyms at query time?
  Applying synonyms at indexing time has problems, e.g. polluting the term
 frequency for synonyms added, preventing distance queries, ...

 Since city names often have multiple terms, e.g. New York, Den Hague,
 etc., I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less filling.

 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 We found this to fix synonyms like ny for New York and vice versa.
 Haven't tried it with docValues, tho.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Tuesday, May 26, 2015 11:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Yes it could be :)

 Anyway thanks for helping.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  I should investigate that, as usually synonyms are analysis stage.
  A simple way is to replace the word with all its synonyms ( including
  original word), but simply using this kind of processor will change
  the token position and offsets, modifying the actual content of the
 document .
 
   I am from Bombay will become  I am from Bombay Mumbai which can
  be annoying.
  So a clever approach must be investigated.
 
  2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:
 
   Okay So how could I do it with UpdateProcessors?
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
mmm this is different !
Without any customisation, right now you could :
- use docValues to provide exact value facets.
- Than you can use a copy field, with the proper analysis, to
search
   when a
user click on a filter !
   
So you will see in your facets :
Mumbai(3)
Bombay(2)
   
And when clicking you see 5 results.
A little bit misleading for the users …
   
On the other hand if you you want to apply the synonyms before,
the indexing pipeline ( because docValues field can not be
analysed), I
  think
you should play with UpdateProcessors.
   
Cheers
   
2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:
   
 We are interested in using docValues for better memory
 utilization
  and
 speed.

 Currently we are faceting the search results on *city. *In city
 we
  have
 also added the synonym for cities like mumbai, bombay (These are
  Indian
 cities). So that result of mumbai is also eligible when somebody
 will applying filter of bombay on search results.

 I need this functionality to apply with docValues enabled field.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  I checked in the Documentation to be sure, but apparently :
 
  DocValues are only available for specific field types. The
  types
   chosen
  determine the underlying Lucene docValue type that will be used.
  The
  available Solr field types are:
 
 - StrField and UUIDField.
 - If the field is single-valued (i.e., multi-valued is
  false),
Lucene
will use the SORTED type.
- If the field is multi-valued, Lucene will use the
  SORTED_SET
 type.
 - Any Trie* numeric fields and EnumField.
 - If the field is single-valued (i.e., multi-valued is
  false),
Lucene
will use the NUMERIC type.
- If the field is multi-valued, Lucene will use the
  SORTED_SET
 type.
 
 
  This means you should not analyse a field where DocValues is
  enabled.
  Can your explain us your use case ? Why are you interested in
   synonyms
  DocValues level ?
 
  Cheers
 
  2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:
 
   To my understanding, docValues are just an uninverted index.
   That
   is,
 it
   contains the terms that are generated at the end of an
   analysis
chain.
   Therefore, you simply enable docValues and include the
   SynonymFilterFactory in your analysis.
  
   Is that enough, or are you struggling with some other issue?
  
   Upayavira
  
   On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
Hi,
   
We have some field *city* in which the docValues are enabled.
  We
need
  to
add the synonym in that field so how could we do it?
   
With Regards
Aman Tandon

RE: docValues: Can we apply synonym

2015-05-27 Thread Reitzel, Charles
Sorry, my bad.   The synonym processor I mention works differently.  It's an 
extension of the EDisMax query processor and doesn't require field level 
synonym configs.

-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] 
Sent: Wednesday, May 27, 2015 6:12 PM
To: solr-user@lucene.apache.org
Subject: RE: docValues: Can we apply synonym

But the query analysis isn't on a specific field, it is applied to the query 
string.

-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Wednesday, May 27, 2015 6:08 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym

Hi Charles,

The problem here is that the docValues works only with primitives data type 
only like String, int, etc So how could we apply synonym on primitive data type.

With Regards
Aman Tandon

On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles  
charles.reit...@tiaa-cref.org wrote:

 Is there any reason you cannot apply the synonyms at query time?
  Applying synonyms at indexing time has problems, e.g. polluting the 
 term frequency for synonyms added, preventing distance queries, ...

 Since city names often have multiple terms, e.g. New York, Den Hague, 
 etc., I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less filling.

 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 We found this to fix synonyms like ny for New York and vice versa.
 Haven't tried it with docValues, tho.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Tuesday, May 26, 2015 11:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Yes it could be :)

 Anyway thanks for helping.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti  
 benedetti.ale...@gmail.com wrote:

  I should investigate that, as usually synonyms are analysis stage.
  A simple way is to replace the word with all its synonyms ( 
  including original word), but simply using this kind of processor 
  will change the token position and offsets, modifying the actual 
  content of the
 document .
 
   I am from Bombay will become  I am from Bombay Mumbai which can 
  be annoying.
  So a clever approach must be investigated.
 
  2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:
 
   Okay So how could I do it with UpdateProcessors?
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti  
   benedetti.ale...@gmail.com wrote:
  
mmm this is different !
Without any customisation, right now you could :
- use docValues to provide exact value facets.
- Than you can use a copy field, with the proper analysis, to 
search
   when a
user click on a filter !
   
So you will see in your facets :
Mumbai(3)
Bombay(2)
   
And when clicking you see 5 results.
A little bit misleading for the users …
   
On the other hand if you you want to apply the synonyms before, 
the indexing pipeline ( because docValues field can not be 
analysed), I
  think
you should play with UpdateProcessors.
   
Cheers
   
2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:
   
 We are interested in using docValues for better memory 
 utilization
  and
 speed.

 Currently we are faceting the search results on *city. *In 
 city we
  have
 also added the synonym for cities like mumbai, bombay (These 
 are
  Indian
 cities). So that result of mumbai is also eligible when 
 somebody will applying filter of bombay on search results.

 I need this functionality to apply with docValues enabled field.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti  
 benedetti.ale...@gmail.com wrote:

  I checked in the Documentation to be sure, but apparently :
 
  DocValues are only available for specific field types. The 
  types
   chosen
  determine the underlying Lucene docValue type that will be used.
  The
  available Solr field types are:
 
 - StrField and UUIDField.
 - If the field is single-valued (i.e., multi-valued is 
  false),
Lucene
will use the SORTED type.
- If the field is multi-valued, Lucene will use the
  SORTED_SET
 type.
 - Any Trie* numeric fields and EnumField.
 - If the field is single-valued (i.e., multi-valued is 
  false),
Lucene
will use the NUMERIC type.
- If the field is multi-valued, Lucene will use the
  SORTED_SET
 type.
 
 
  This means you should not analyse a field where DocValues is
  enabled.
  Can your explain us your use case ? Why are you interested 
  in
   synonyms
  DocValues level ?
 
  Cheers
 
  2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk

Re: docValues: Can we apply synonym

2015-05-26 Thread Aman Tandon
Yes it could be :)

Anyway thanks for helping.

With Regards
Aman Tandon

On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 I should investigate that, as usually synonyms are analysis stage.
 A simple way is to replace the word with all its synonyms ( including
 original word), but simply using this kind of processor will change the
 token position and offsets, modifying the actual content of the document .

  I am from Bombay will become  I am from Bombay Mumbai which can be
 annoying.
 So a clever approach must be investigated.

 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Okay So how could I do it with UpdateProcessors?
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   mmm this is different !
   Without any customisation, right now you could :
   - use docValues to provide exact value facets.
   - Than you can use a copy field, with the proper analysis, to search
  when a
   user click on a filter !
  
   So you will see in your facets :
   Mumbai(3)
   Bombay(2)
  
   And when clicking you see 5 results.
   A little bit misleading for the users …
  
   On the other hand if you you want to apply the synonyms before, the
   indexing pipeline ( because docValues field can not be analysed), I
 think
   you should play with UpdateProcessors.
  
   Cheers
  
   2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:
  
We are interested in using docValues for better memory utilization
 and
speed.
   
Currently we are faceting the search results on *city. *In city we
 have
also added the synonym for cities like mumbai, bombay (These are
 Indian
cities). So that result of mumbai is also eligible when somebody will
applying filter of bombay on search results.
   
I need this functionality to apply with docValues enabled field.
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:
   
 I checked in the Documentation to be sure, but apparently :

 DocValues are only available for specific field types. The types
  chosen
 determine the underlying Lucene docValue type that will be used.
 The
 available Solr field types are:

- StrField and UUIDField.
- If the field is single-valued (i.e., multi-valued is false),
   Lucene
   will use the SORTED type.
   - If the field is multi-valued, Lucene will use the
 SORTED_SET
type.
- Any Trie* numeric fields and EnumField.
- If the field is single-valued (i.e., multi-valued is false),
   Lucene
   will use the NUMERIC type.
   - If the field is multi-valued, Lucene will use the
 SORTED_SET
type.


 This means you should not analyse a field where DocValues is
 enabled.
 Can your explain us your use case ? Why are you interested in
  synonyms
 DocValues level ?

 Cheers

 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:

  To my understanding, docValues are just an uninverted index. That
  is,
it
  contains the terms that are generated at the end of an analysis
   chain.
  Therefore, you simply enable docValues and include the
  SynonymFilterFactory in your analysis.
 
  Is that enough, or are you struggling with some other issue?
 
  Upayavira
 
  On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
   Hi,
  
   We have some field *city* in which the docValues are enabled.
 We
   need
 to
   add the synonym in that field so how could we do it?
  
   With Regards
   Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England

   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: docValues: Can we apply synonym

2015-05-26 Thread Upayavira
To my understanding, docValues are just an uninverted index. That is, it
contains the terms that are generated at the end of an analysis chain.
Therefore, you simply enable docValues and include the
SynonymFilterFactory in your analysis.

Is that enough, or are you struggling with some other issue?

Upayavira

On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
 Hi,
 
 We have some field *city* in which the docValues are enabled. We need to
 add the synonym in that field so how could we do it?
 
 With Regards
 Aman Tandon


Re: docValues: Can we apply synonym

2015-05-26 Thread Alessandro Benedetti
mmm this is different !
Without any customisation, right now you could :
- use docValues to provide exact value facets.
- Than you can use a copy field, with the proper analysis, to search when a
user click on a filter !

So you will see in your facets :
Mumbai(3)
Bombay(2)

And when clicking you see 5 results.
A little bit misleading for the users …

On the other hand if you you want to apply the synonyms before, the
indexing pipeline ( because docValues field can not be analysed), I think
you should play with UpdateProcessors.

Cheers

2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 We are interested in using docValues for better memory utilization and
 speed.

 Currently we are faceting the search results on *city. *In city we have
 also added the synonym for cities like mumbai, bombay (These are Indian
 cities). So that result of mumbai is also eligible when somebody will
 applying filter of bombay on search results.

 I need this functionality to apply with docValues enabled field.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  I checked in the Documentation to be sure, but apparently :
 
  DocValues are only available for specific field types. The types chosen
  determine the underlying Lucene docValue type that will be used. The
  available Solr field types are:
 
 - StrField and UUIDField.
 - If the field is single-valued (i.e., multi-valued is false), Lucene
will use the SORTED type.
- If the field is multi-valued, Lucene will use the SORTED_SET
 type.
 - Any Trie* numeric fields and EnumField.
 - If the field is single-valued (i.e., multi-valued is false), Lucene
will use the NUMERIC type.
- If the field is multi-valued, Lucene will use the SORTED_SET
 type.
 
 
  This means you should not analyse a field where DocValues is enabled.
  Can your explain us your use case ? Why are you interested in synonyms
  DocValues level ?
 
  Cheers
 
  2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:
 
   To my understanding, docValues are just an uninverted index. That is,
 it
   contains the terms that are generated at the end of an analysis chain.
   Therefore, you simply enable docValues and include the
   SynonymFilterFactory in your analysis.
  
   Is that enough, or are you struggling with some other issue?
  
   Upayavira
  
   On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
Hi,
   
We have some field *city* in which the docValues are enabled. We need
  to
add the synonym in that field so how could we do it?
   
With Regards
Aman Tandon
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: docValues: Can we apply synonym

2015-05-26 Thread Aman Tandon
Okay So how could I do it with UpdateProcessors?

With Regards
Aman Tandon

On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 mmm this is different !
 Without any customisation, right now you could :
 - use docValues to provide exact value facets.
 - Than you can use a copy field, with the proper analysis, to search when a
 user click on a filter !

 So you will see in your facets :
 Mumbai(3)
 Bombay(2)

 And when clicking you see 5 results.
 A little bit misleading for the users …

 On the other hand if you you want to apply the synonyms before, the
 indexing pipeline ( because docValues field can not be analysed), I think
 you should play with UpdateProcessors.

 Cheers

 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  We are interested in using docValues for better memory utilization and
  speed.
 
  Currently we are faceting the search results on *city. *In city we have
  also added the synonym for cities like mumbai, bombay (These are Indian
  cities). So that result of mumbai is also eligible when somebody will
  applying filter of bombay on search results.
 
  I need this functionality to apply with docValues enabled field.
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   I checked in the Documentation to be sure, but apparently :
  
   DocValues are only available for specific field types. The types chosen
   determine the underlying Lucene docValue type that will be used. The
   available Solr field types are:
  
  - StrField and UUIDField.
  - If the field is single-valued (i.e., multi-valued is false),
 Lucene
 will use the SORTED type.
 - If the field is multi-valued, Lucene will use the SORTED_SET
  type.
  - Any Trie* numeric fields and EnumField.
  - If the field is single-valued (i.e., multi-valued is false),
 Lucene
 will use the NUMERIC type.
 - If the field is multi-valued, Lucene will use the SORTED_SET
  type.
  
  
   This means you should not analyse a field where DocValues is enabled.
   Can your explain us your use case ? Why are you interested in synonyms
   DocValues level ?
  
   Cheers
  
   2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:
  
To my understanding, docValues are just an uninverted index. That is,
  it
contains the terms that are generated at the end of an analysis
 chain.
Therefore, you simply enable docValues and include the
SynonymFilterFactory in your analysis.
   
Is that enough, or are you struggling with some other issue?
   
Upayavira
   
On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
 Hi,

 We have some field *city* in which the docValues are enabled. We
 need
   to
 add the synonym in that field so how could we do it?

 With Regards
 Aman Tandon
   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: docValues: Can we apply synonym

2015-05-26 Thread Alessandro Benedetti
I checked in the Documentation to be sure, but apparently :

DocValues are only available for specific field types. The types chosen
determine the underlying Lucene docValue type that will be used. The
available Solr field types are:

   - StrField and UUIDField.
   - If the field is single-valued (i.e., multi-valued is false), Lucene
  will use the SORTED type.
  - If the field is multi-valued, Lucene will use the SORTED_SET type.
   - Any Trie* numeric fields and EnumField.
   - If the field is single-valued (i.e., multi-valued is false), Lucene
  will use the NUMERIC type.
  - If the field is multi-valued, Lucene will use the SORTED_SET type.


This means you should not analyse a field where DocValues is enabled.
Can your explain us your use case ? Why are you interested in synonyms
DocValues level ?

Cheers

2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:

 To my understanding, docValues are just an uninverted index. That is, it
 contains the terms that are generated at the end of an analysis chain.
 Therefore, you simply enable docValues and include the
 SynonymFilterFactory in your analysis.

 Is that enough, or are you struggling with some other issue?

 Upayavira

 On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
  Hi,
 
  We have some field *city* in which the docValues are enabled. We need to
  add the synonym in that field so how could we do it?
 
  With Regards
  Aman Tandon




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: docValues: Can we apply synonym

2015-05-26 Thread Alessandro Benedetti
I should investigate that, as usually synonyms are analysis stage.
A simple way is to replace the word with all its synonyms ( including
original word), but simply using this kind of processor will change the
token position and offsets, modifying the actual content of the document .

 I am from Bombay will become  I am from Bombay Mumbai which can be
annoying.
So a clever approach must be investigated.

2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Okay So how could I do it with UpdateProcessors?

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  mmm this is different !
  Without any customisation, right now you could :
  - use docValues to provide exact value facets.
  - Than you can use a copy field, with the proper analysis, to search
 when a
  user click on a filter !
 
  So you will see in your facets :
  Mumbai(3)
  Bombay(2)
 
  And when clicking you see 5 results.
  A little bit misleading for the users …
 
  On the other hand if you you want to apply the synonyms before, the
  indexing pipeline ( because docValues field can not be analysed), I think
  you should play with UpdateProcessors.
 
  Cheers
 
  2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:
 
   We are interested in using docValues for better memory utilization and
   speed.
  
   Currently we are faceting the search results on *city. *In city we have
   also added the synonym for cities like mumbai, bombay (These are Indian
   cities). So that result of mumbai is also eligible when somebody will
   applying filter of bombay on search results.
  
   I need this functionality to apply with docValues enabled field.
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
I checked in the Documentation to be sure, but apparently :
   
DocValues are only available for specific field types. The types
 chosen
determine the underlying Lucene docValue type that will be used. The
available Solr field types are:
   
   - StrField and UUIDField.
   - If the field is single-valued (i.e., multi-valued is false),
  Lucene
  will use the SORTED type.
  - If the field is multi-valued, Lucene will use the SORTED_SET
   type.
   - Any Trie* numeric fields and EnumField.
   - If the field is single-valued (i.e., multi-valued is false),
  Lucene
  will use the NUMERIC type.
  - If the field is multi-valued, Lucene will use the SORTED_SET
   type.
   
   
This means you should not analyse a field where DocValues is enabled.
Can your explain us your use case ? Why are you interested in
 synonyms
DocValues level ?
   
Cheers
   
2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:
   
 To my understanding, docValues are just an uninverted index. That
 is,
   it
 contains the terms that are generated at the end of an analysis
  chain.
 Therefore, you simply enable docValues and include the
 SynonymFilterFactory in your analysis.

 Is that enough, or are you struggling with some other issue?

 Upayavira

 On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
  Hi,
 
  We have some field *city* in which the docValues are enabled. We
  need
to
  add the synonym in that field so how could we do it?
 
  With Regards
  Aman Tandon

   
   
   
--
--
   
Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti
   
Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?
   
William Blake - Songs of Experience -1794 England
   
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: docValues: Can we apply synonym

2015-05-26 Thread Aman Tandon
We are interested in using docValues for better memory utilization and
speed.

Currently we are faceting the search results on *city. *In city we have
also added the synonym for cities like mumbai, bombay (These are Indian
cities). So that result of mumbai is also eligible when somebody will
applying filter of bombay on search results.

I need this functionality to apply with docValues enabled field.

With Regards
Aman Tandon

On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 I checked in the Documentation to be sure, but apparently :

 DocValues are only available for specific field types. The types chosen
 determine the underlying Lucene docValue type that will be used. The
 available Solr field types are:

- StrField and UUIDField.
- If the field is single-valued (i.e., multi-valued is false), Lucene
   will use the SORTED type.
   - If the field is multi-valued, Lucene will use the SORTED_SET type.
- Any Trie* numeric fields and EnumField.
- If the field is single-valued (i.e., multi-valued is false), Lucene
   will use the NUMERIC type.
   - If the field is multi-valued, Lucene will use the SORTED_SET type.


 This means you should not analyse a field where DocValues is enabled.
 Can your explain us your use case ? Why are you interested in synonyms
 DocValues level ?

 Cheers

 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:

  To my understanding, docValues are just an uninverted index. That is, it
  contains the terms that are generated at the end of an analysis chain.
  Therefore, you simply enable docValues and include the
  SynonymFilterFactory in your analysis.
 
  Is that enough, or are you struggling with some other issue?
 
  Upayavira
 
  On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
   Hi,
  
   We have some field *city* in which the docValues are enabled. We need
 to
   add the synonym in that field so how could we do it?
  
   With Regards
   Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: DocValues=true and indexed=false

2015-04-12 Thread david.w.smi...@gmail.com
Yes, surprisingly enough, if indexed=false, docValues=true — you can still
search.  I’ve seen the code behind it; it’s interesting.  Rob wrote it.
I’m not sure how scalable it is compared to the inverted index.  I suspect
it wouldn’t do well for a lot of distinct values but will fine for a small
number of them.  What definition of “small” though… I don’t know.  I’d love
to see benchmarks of such a comparison.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Apr 9, 2015 at 8:30 PM, Erick Erickson erickerick...@gmail.com
wrote:

 So I was a bit embarrassed to be asked whether there was a use-case
 for this. I did some simple tests on a field (with a magnificent total
 of 32 docs indexed) and, with the exception of an error that
 facet.mincount=0 doesn't work if a field isn't indexed (see
 SOLR-5260), everything I tried worked fine. Searching by wildcards,
 searching with the term query parser, faceting, grouping, whatever. I
 admit I didn't spend very much time looking.

 As I understand it DocValues are basically a serialized UnInverted
 field so I'm wondering how searches work at all. Or, more
 specifically, whether this works fine on small numbers of docs but
 wouldn't scale. Or does a search on a DocValues field build an
 inverted field?

 Or anything else I should know. Is the rule simply 'if you search on
 it, or use it in fq clauses, set indexed=true, and if you facet,
 group etc. set indexed=true '. So it would make sense to set
 docValues=true and indexed=false on a field that's never searched
 or used in an fq clause but used for faceting  etc.

 So my mental model is some operations need inverted fields, and some
 need uninverted fields and that docValues provide a way to store
 uninverted fields on disk just like indexed=true allows you to store
 inverted fields on disk. Assuming you need both and set both
 indexed=true and docValues=true,  the _total_ memory requirements
 for Solr are the same. What's NOT the same is that the docValues make
 use of MMapDirectory where uninverting a field doesn't (this last is a
 total guess).

 I'm preparing a Google Doc that I'll certainly permit to anyone who
 wants to add to it. I'll then add the results into the Reference
 Guide.

 Anyway, you can see I'm confused, but if I ask enough silly questions
 eventually my questions get less silly.

 Erick



Re: DOcValues

2015-04-04 Thread William Bell
Thank you. This is very understandable.

I heard the Strings limitation for DocValues goes away in 5.0?

On Fri, Apr 3, 2015 at 2:35 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Fri, Apr 3, 2015 at 12:52 PM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

  Shalin Shekhar Mangar shalinman...@gmail.com wrote:
   The UnInvertedField method of faceting is no longer used in Solr.
 
  True. Un-inversion still takes place for non-DV-fields though (see
  UnivertingReader, which seems to lead to
  FieldCacheImpl.SortedDocValuesCache). But the wrapping is far nicer as
  everything looks like DocValues now and it seems (guessing quite a bit
  here) that the old 16M-limitation is gone.
 
 
 Yes, you are right. I didn't mean to imply that fields aren't un-inverted
 at all.


  - Toke Eskildsen
 



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: DOcValues

2015-04-03 Thread Tomoko Uchida
Hi,

According to line 430 in SImpleFacet.java (Solr 5.0.0), facet method is
forced to fc when we set docValues=true.
https://github.com/apache/lucene-solr/blob/lucene_solr_5_0_0/solr/core/src/java/org/apache/solr/request/SimpleFacets.java#L430

So we need not set facet.method to use doc values. Even if we specify
facet.method=enum, it might be ignored.
If my understanding is wrong, please correct that.

Regards,
Tomoko


2015-04-03 12:01 GMT+09:00 William Bell billnb...@gmail.com:

 If I set indexed=true and docvalues=true, when I
 facet=truefacet.field=manu_exact
 will it use docValues or the Indexed version?

 Also, does it help with *Too many values for UnInvertedField faceting ?*


 *Do I need to set facet.method when using docvalues?*

 field name=manu_exact type=string indexed=true stored=true
 docValues=true /

 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Re: DOcValues

2015-04-03 Thread Toke Eskildsen
William Bell billnb...@gmail.com wrote:

[docValues activation?]

 Also, does it help with *Too many values for UnInvertedField faceting ?*

Yes. There is an internal limit using UnInverted (aka fc without docValues) of 
16M somewhere - I am not sure exactly what it takes to trigger it, but many 
unique values and/or references will do it at some point.

docValues scales quite a bit higher; we have successfully used it with 7 
billion references to 640 million unique values in a single shard (where it 
worked surprisingly well BTW).

As far as I can see, there is an internal limit of 2 billion unique values per 
shard for docValues. I would like to see that go away, but that's just part of 
an ongoing mission to get Solr to break free from the old 2 billion should be 
enough for everyone-design.

- Toke Eskildsen

Re: DOcValues

2015-04-03 Thread Toke Eskildsen
Shalin Shekhar Mangar shalinman...@gmail.com wrote:
 The UnInvertedField method of faceting is no longer used in Solr.

True. Un-inversion still takes place for non-DV-fields though (see 
UnivertingReader, which seems to lead to FieldCacheImpl.SortedDocValuesCache). 
But the wrapping is far nicer as everything looks like DocValues now and it 
seems (guessing quite a bit here) that the old 16M-limitation is gone.

- Toke Eskildsen


Re: DOcValues

2015-04-03 Thread Shalin Shekhar Mangar
Sorry I should have been more clear. The UnInvertedField method of faceting
is not used in Solr since Solr 5.0.

On Fri, Apr 3, 2015 at 12:17 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 The UnInvertedField method of faceting is no longer used in Solr.

 See https://issues.apache.org/jira/browse/SOLR-7190

 On Fri, Apr 3, 2015 at 10:33 AM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

 William Bell billnb...@gmail.com wrote:

 [docValues activation?]

  Also, does it help with *Too many values for UnInvertedField faceting
 ?*

 Yes. There is an internal limit using UnInverted (aka fc without
 docValues) of 16M somewhere - I am not sure exactly what it takes to
 trigger it, but many unique values and/or references will do it at some
 point.

 docValues scales quite a bit higher; we have successfully used it with 7
 billion references to 640 million unique values in a single shard (where it
 worked surprisingly well BTW).

 As far as I can see, there is an internal limit of 2 billion unique
 values per shard for docValues. I would like to see that go away, but
 that's just part of an ongoing mission to get Solr to break free from the
 old 2 billion should be enough for everyone-design.

 - Toke Eskildsen




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.


Re: DOcValues

2015-04-03 Thread Shalin Shekhar Mangar
The UnInvertedField method of faceting is no longer used in Solr.

See https://issues.apache.org/jira/browse/SOLR-7190

On Fri, Apr 3, 2015 at 10:33 AM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 William Bell billnb...@gmail.com wrote:

 [docValues activation?]

  Also, does it help with *Too many values for UnInvertedField faceting
 ?*

 Yes. There is an internal limit using UnInverted (aka fc without
 docValues) of 16M somewhere - I am not sure exactly what it takes to
 trigger it, but many unique values and/or references will do it at some
 point.

 docValues scales quite a bit higher; we have successfully used it with 7
 billion references to 640 million unique values in a single shard (where it
 worked surprisingly well BTW).

 As far as I can see, there is an internal limit of 2 billion unique values
 per shard for docValues. I would like to see that go away, but that's just
 part of an ongoing mission to get Solr to break free from the old 2
 billion should be enough for everyone-design.

 - Toke Eskildsen




-- 
Regards,
Shalin Shekhar Mangar.


Re: DOcValues

2015-04-03 Thread Shalin Shekhar Mangar
On Fri, Apr 3, 2015 at 12:52 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Shalin Shekhar Mangar shalinman...@gmail.com wrote:
  The UnInvertedField method of faceting is no longer used in Solr.

 True. Un-inversion still takes place for non-DV-fields though (see
 UnivertingReader, which seems to lead to
 FieldCacheImpl.SortedDocValuesCache). But the wrapping is far nicer as
 everything looks like DocValues now and it seems (guessing quite a bit
 here) that the old 16M-limitation is gone.


Yes, you are right. I didn't mean to imply that fields aren't un-inverted
at all.


 - Toke Eskildsen




-- 
Regards,
Shalin Shekhar Mangar.


Re: DOcValues

2015-04-03 Thread Tomoko Uchida
Dear Shawn,

Thank you for the detailed explanation!
Many users would need such guidelines about memory consumption (and
performance trade-offs) for facets.

Thanks,
Tomoko

2015-04-03 22:26 GMT+09:00 Shawn Heisey apa...@elyograg.org:

 On 4/3/2015 6:53 AM, Tomoko Uchida wrote:
  According to line 430 in SImpleFacet.java (Solr 5.0.0), facet method is
  forced to fc when we set docValues=true.
 
 https://github.com/apache/lucene-solr/blob/lucene_solr_5_0_0/solr/core/src/java/org/apache/solr/request/SimpleFacets.java#L430
 
  So we need not set facet.method to use doc values. Even if we specify
  facet.method=enum, it might be ignored.
  If my understanding is wrong, please correct that.

 That code certainly looks like facet.method=enum is ignored when
 docValues are present.

 As I understand it, the only disadvantage to facet.method=fc (when
 docValues are not present) is that it uses a lot of heap memory in the
 FieldCache (or whatever replaces FieldCache in 5.0).  That memory
 structure makes subsequent facets much faster, but on a large index, the
 memory required can be astronomical.  The enum method skips that
 caching, relying on the operating system to cache the data in the index
 itself.  If there's enough memory for good OS caching, enum can be
 almost as fast as fc, with a much smaller Java heap.

 On a field with docValues, the large memory structure is not required,
 and an optimized code path is used.  Based on the comment in the java
 code that you highlighted, it sounds like only fc will do docValues, but
 no handling is present for the fcs method with docValues, which would
 seem to contradict that comment a little bit.

 Thanks,
 Shawn




Re: DOcValues

2015-04-03 Thread Shawn Heisey
On 4/3/2015 6:53 AM, Tomoko Uchida wrote:
 According to line 430 in SImpleFacet.java (Solr 5.0.0), facet method is
 forced to fc when we set docValues=true.
 https://github.com/apache/lucene-solr/blob/lucene_solr_5_0_0/solr/core/src/java/org/apache/solr/request/SimpleFacets.java#L430
 
 So we need not set facet.method to use doc values. Even if we specify
 facet.method=enum, it might be ignored.
 If my understanding is wrong, please correct that.

That code certainly looks like facet.method=enum is ignored when
docValues are present.

As I understand it, the only disadvantage to facet.method=fc (when
docValues are not present) is that it uses a lot of heap memory in the
FieldCache (or whatever replaces FieldCache in 5.0).  That memory
structure makes subsequent facets much faster, but on a large index, the
memory required can be astronomical.  The enum method skips that
caching, relying on the operating system to cache the data in the index
itself.  If there's enough memory for good OS caching, enum can be
almost as fast as fc, with a much smaller Java heap.

On a field with docValues, the large memory structure is not required,
and an optimized code path is used.  Based on the comment in the java
code that you highlighted, it sounds like only fc will do docValues, but
no handling is present for the fcs method with docValues, which would
seem to contradict that comment a little bit.

Thanks,
Shawn



Re: DocValues without re-index?

2014-07-22 Thread Mikhail Khludnev
Michael,

What's first re-indexing?
I'm sure you are aware about binary/number DocValues updates, but it works
for existing column strides. I can guess you are talking about something
like sidecar index http://www.youtube.com/watch?v=9h3ax5Wmxpk



On Tue, Jul 22, 2014 at 6:50 AM, Michael Ryan mr...@moreover.com wrote:

 Is it possible to use DocValues on an existing index without first
 re-indexing?

 -Michael




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


RE: DocValues without re-index?

2014-07-22 Thread Michael Ryan
I mean re-adding all of the documents in my index. The DocValues wiki page says 
that this is necessary, but I wanted to know if there was a way around it.

-Michael

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Tuesday, July 22, 2014 2:14 AM
To: solr-user
Subject: Re: DocValues without re-index?

Michael,

What's first re-indexing?
I'm sure you are aware about binary/number DocValues updates, but it works for 
existing column strides. I can guess you are talking about something like 
sidecar index http://www.youtube.com/watch?v=9h3ax5Wmxpk



On Tue, Jul 22, 2014 at 6:50 AM, Michael Ryan mr...@moreover.com wrote:

 Is it possible to use DocValues on an existing index without first 
 re-indexing?

 -Michael




--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: DocValues without re-index?

2014-07-22 Thread Shawn Heisey
On 7/22/2014 6:14 AM, Michael Ryan wrote:
 I mean re-adding all of the documents in my index. The DocValues wiki page 
 says that this is necessary, but I wanted to know if there was a way around 
 it.

If your index meets the strict criteria for Atomic Updates, you could
update all the documents by setting one field to the value it's
already got.

https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

If the index does not meet the requirements for Atomic Updates, then
you'll need to completely reindex after adding docValues to a field.
Features that use docValues (like sorting and facets) will not work on
that field until you reindex.  As I understand it, those features cannot
fall back to indexed values.

It sounds like you already know about what this page says:

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: DocValues and StatsComponent

2014-04-27 Thread Ahmet Arslan
Hi Harish,

I created https://issues.apache.org/jira/browse/SOLR-6024 on behalf of you.

Ahmet



On Friday, April 4, 2014 3:13 AM, Ahmet Arslan iori...@yahoo.com wrote:
Hi Harish,

I re-produced your problem with example/default setup.

I enabled doc values example fields. ( deleted the original ones) and indexed 
example documents.

 field name=popularity type=int indexed=true stored=true 
docValues=true /
 field name=manu_exact type=string indexed=false stored=false 
docValues=true /
 field name=cat type=string indexed=true stored=true docValues=true 
multiValued=true/

Single valued fields work fine. But stats on multi-valued field cat yields 

http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=truestats=truestats.field=cat


msg: Type mismatch: cat was indexed as SORTED_SET, code: 400

And confluence does not say anything about this.

Can you file a jira issue?

Ahmet


On Thursday, April 3, 2014 11:01 PM, Harish Agarwal harish.agar...@gmail.com 
wrote:
Is there a known issue using the StatsComponent against fields indexed with
docvalues?  My setup is currently throwing this error (against the latest
nightly build):

org.apache.solr.common.Solr*Exception*; org.apache.solr.common.Solr
*Exception*: Type mismatch: INTEGER_4 was indexed as SORTED_SET



Re: DocValues and StatsComponent

2014-04-03 Thread Ahmet Arslan
Hi Harish,

I re-produced your problem with example/default setup.

I enabled doc values example fields. ( deleted the original ones) and indexed 
example documents.

 field name=popularity type=int indexed=true stored=true 
docValues=true /
 field name=manu_exact type=string indexed=false stored=false 
docValues=true /
 field name=cat type=string indexed=true stored=true docValues=true 
multiValued=true/

Single valued fields work fine. But stats on multi-valued field cat yields 

http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=truestats=truestats.field=cat


msg: Type mismatch: cat was indexed as SORTED_SET, code: 400

And confluence does not say anything about this.

Can you file a jira issue?

Ahmet

On Thursday, April 3, 2014 11:01 PM, Harish Agarwal harish.agar...@gmail.com 
wrote:
Is there a known issue using the StatsComponent against fields indexed with
docvalues?  My setup is currently throwing this error (against the latest
nightly build):

org.apache.solr.common.Solr*Exception*; org.apache.solr.common.Solr
*Exception*: Type mismatch: INTEGER_4 was indexed as SORTED_SET



  1   2   >