[jira] [Commented] (SOLR-7036) Faster method for group.facet

2016-10-25 Thread Danny Teichthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605136#comment-15605136
 ] 

Danny Teichthal commented on SOLR-7036:
---

[~ysee...@gmail.com] I want to make sure I got this right.
You say that the group.facet functionality is exactly the same as calling JSON 
API with unique function applied on the group.field.
For current patch it means:
No need to touch the UnInverted field code, instead we should just pass the 
calculation to the JSON API and add a unique calculation on the group.field 
from our query.
Did I get it right?

If it is true:
1. Will it have the same or better performance than current patch?
2. Will we also have transparent support also in prefix facet and facet query 
(the patch doesn't support this currently)?



> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, SOLR-7036_zipped.zip, 
> jstack-output.txt, performance.txt, source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7036) Faster method for group.facet

2016-08-16 Thread Danny Teichthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Teichthal updated SOLR-7036:
--
Attachment: SOLR-7036.patch

Regenerated the patch.
I was able to apply it using command - git apply --ignore-space-change 
--ignore-whitespace --whitespace=nowarn SOLR-7036.patch

> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, SOLR-7036_zipped.zip, 
> jstack-output.txt, performance.txt, source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7036) Faster method for group.facet

2016-08-16 Thread Danny Teichthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Teichthal updated SOLR-7036:
--
Attachment: SOLR-7036_zipped.zip

> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, SOLR-7036.patch, SOLR-7036_zipped.zip, jstack-output.txt, 
> performance.txt, source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7036) Faster method for group.facet

2016-08-16 Thread Danny Teichthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423309#comment-15423309
 ] 

Danny Teichthal commented on SOLR-7036:
---

[~erickerickson] I followed the HowToContribute wiki.
Please forgive me because I'm newbie on git.
If it matters, I work on windows, with eclipse,  git version - 2.9.0.windows.1.
Executed the commands from windows command line.

I pulled the latest version today.
Fixed my old files (currently attached to JIRa as a zip file) to pass 
compilation. 
I committed my changes to my local repository.
Then, executed patch command:
This is the exact command I executed "git format-patch -w -b origin/master 
--stdout > SOLR-7036.patch"

What am I doing wrong?

Meanwhile, I'm uploading a zip file containing my changes.
It contains the directory structure, so it can be just extracted if you are 
inside directory "lucene-solr"

> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, SOLR-7036.patch, jstack-output.txt, performance.txt, 
> source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7036) Faster method for group.facet

2016-08-16 Thread Danny Teichthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Teichthal updated SOLR-7036:
--
Attachment: SOLR-7036.patch

Uploading the patch again.
It appears that was a refactoring task of JSON facet code.
I merged my changes with latest head revision.
Running tests again for checking the old method, just to verify.
I don't have the full numbers yet.
But from partial results I can confirm that the performance of the old method 
is still not good and measured in seconds on our data sets.
[~erickerickson] please let me know if you still have problems applying the 
patch.


> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, SOLR-7036.patch, jstack-output.txt, performance.txt, 
> source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7036) Faster method for group.facet

2016-08-15 Thread Danny Teichthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422190#comment-15422190
 ] 

Danny Teichthal commented on SOLR-7036:
---

[~erickerickson]
1. The tests that were made by Miri, who uploaded the patch, were against 
trunk. You can see the numbers on her comment. The 50 percentile with the old 
method were ~6 seconds on our data set comparing to ~300 milliseconds with the 
patch.
2. I will check what went wrong with the patch and upload a new one.

> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, jstack-output.txt, performance.txt, source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7036) Faster method for group.facet

2016-08-14 Thread Danny Teichthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420266#comment-15420266
 ] 

Danny Teichthal commented on SOLR-7036:
---

Hi [~jpswain],
Did you have the chance to run your tests yet?

> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, jstack-output.txt, performance.txt, source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7036) Faster method for group.facet

2016-07-21 Thread Danny Teichthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387826#comment-15387826
 ] 

Danny Teichthal commented on SOLR-7036:
---

[~jpswain], thanks, looking forward to it.

> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, jstack-output.txt, performance.txt, source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7036) Faster method for group.facet

2016-07-20 Thread Danny Teichthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386056#comment-15386056
 ] 

Danny Teichthal commented on SOLR-7036:
---

[~jpswain] Hi, I'm a team member of [~mdvir1].
I suspect that the problem is that your field is defined with docValues=true.
Can you check again in your logs if you can find some error in the pattern 
"Type mismatc"  "was indexed as" ..., it should be indise 
class DocTermOrds.
If you also have the option to remove docValues=true from your field and test 
the new method again it would be grate.

In short, the new method does not support fields with doc values.
There's a related Lucene JIRA issue that describes that about UninvertedField - 
LUCENE-7119.

I see 2 options for a fix in the patch:
1. We should not allow using group.facet.method=uif on doc values field and 
throw a descriptive exception on this case.
2. In case we hit a doc value field - fallback to the original method.

Personally, I prefer to throw an exception, this way new consumers of this 
method will not get confused and will get a descriptive error.

The second option is used on the JSON facet API.
If you will facet on that field without grouping and set "facet.method=uif", 
then FacetField.createFacetProcessor will detect that it is a docValue and fall 
back to using the doc values processor instead of UIF.



Regarding the attached stacktrace:
I analyzed the stacktrace that you attached and it looks like there is only one 
active thread that is waiting on a monitor for the field value cache to be 
populated.
It is stuck on UnInvertedField.getUnInvertedField.
Since there's no other thread in work, it looks like there was another thread 
that tried to create the uninverted field and got an exception, in this case it 
will not notify other threads and thus you are stuck on wait().
I'm not familiar with the code that much, but it looks like it could be solved 
if the creation of UninvertedField was inside a try/finally block and 
cache.notifyAll(); was placed in the finally.

Would appreciate if you could confirm or deny this theory.



> Faster method for group.facet
> -
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 4.10.3
>Reporter: Jim Musil
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, jstack-output.txt, performance.txt, source_for_patch.zip
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2016-06-23 Thread Danny Teichthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345928#comment-15345928
 ] 

Danny Teichthal commented on SOLR-6492:
---

[~solrtrey] - great, looking forward too.
Are there any other planned changes except from the new license?

> Solr field type that supports multiple, dynamic analyzers
> -
>
> Key: SOLR-6492
> URL: https://issues.apache.org/jira/browse/SOLR-6492
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Trey Grainger
> Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to 
> support one or more dynamically-selected analyzers for a field. For example, 
> someone may have a "content" field and pass in a document in Greek (using an 
> Analyzer with Tokenizer/Filters for German), a separate document in English 
> (using an English Analyzer), and possibly even a field with mixed-language 
> content in Greek and English. This latter case could pass the content 
> separately through both an analyzer defined for Greek and another Analyzer 
> defined for English, stacking or concatenating the token streams based upon 
> the use-case.
> There are some distinct advantages in terms of index size and query 
> performance which can be obtained by stacking terms from multiple analyzers 
> in the same field instead of duplicating content in separate fields and 
> searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a 
> different analyzer for the same field to remove a feature (i.e. turning 
> on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2015-11-01 Thread Danny Teichthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984312#comment-14984312
 ] 

Danny Teichthal commented on SOLR-6492:
---

Hi [~solrtrey],
We started exploring your github code for indexing multi lingual text to a 
single field for internal project.
It looks like the best solution for our requirements.
What is holding us back is the question if and when it will be part of Solr.
Is there any planned progress for this JIRA issue?


Thanks,




> Solr field type that supports multiple, dynamic analyzers
> -
>
> Key: SOLR-6492
> URL: https://issues.apache.org/jira/browse/SOLR-6492
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Trey Grainger
> Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to 
> support one or more dynamically-selected analyzers for a field. For example, 
> someone may have a "content" field and pass in a document in Greek (using an 
> Analyzer with Tokenizer/Filters for German), a separate document in English 
> (using an English Analyzer), and possibly even a field with mixed-language 
> content in Greek and English. This latter case could pass the content 
> separately through both an analyzer defined for Greek and another Analyzer 
> defined for English, stacking or concatenating the token streams based upon 
> the use-case.
> There are some distinct advantages in terms of index size and query 
> performance which can be obtained by stacking terms from multiple analyzers 
> in the same field instead of duplicating content in separate fields and 
> searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a 
> different analyzer for the same field to remove a feature (i.e. turning 
> on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org