from:"Dmitry Kan"

Re: Moving usage documentation of Luke to Lucene Website

2021-05-11 Thread Dmitry Kan

Hi Michael,

The page looks great, thanks for including the link to my fork with
older releases!

On Mon, 10 May 2021 at 13:24, Michael Wechner 
wrote:

> Hi Together
>
> I have made a draft of the README (including logos/images) at
>
> https://github.com/wyona/apache-lucene-luke
>
> Let me know what you think and I can copy it to
>
> https://github.com/apache/lucene/tree/main/lucene/luke
>
> and make a PR.
>
> Thanks
>
> Michael
>
>
>
> Am 04.05.21 um 07:15 schrieb Dmitry Kan:
>
> Hello!
>
> Glad to see some docs Tomoko and I wrote years ago turned out be useful
> and on the way to Lucene's docs -- just a suggestion, that you could also
> have a link to older (pre 8.x) releases of luke, as it may well be there
> are users for those versions?
>
> https://github.com/DmitryKey/luke/releases
>
> Best,
>
> Dmitry
>
> On Sun, 2 May 2021 at 10:48, Tomoko Uchida 
> wrote:
>
>> Please feel free to make a PR if you'd like (with a Jira isssue could be
>> better).
>> I just wanted to leave some comments about the contents.
>>
>> > - About Luke
>> This would still makes sense to copy it to Lucene repo to me.
>>
>> > - Launching Luke
>> This section has been outdated except for step 3.
>>
>> > - Search engines luke can deal with
>> Personally I don't think we should mention/emphasize specific 3rd party
>> projects on our official website.
>>
>> > - Brief project history
>> This section is already included in Luke itself - its "About" dialog.
>>
>> Tomoko
>>
>>
>> 2021年5月2日(日) 15:20 Michael Wechner :
>>
>>> I would like to suggest, that we only copy the content from
>>> https://github.com/DmitryKey/luke#important-notice which still makes
>>> sense, like for example
>>>
>>> - About Luke
>>> - Launching Luke
>>> - Search engines luke can deal with
>>> - Brief project history
>>>
>>> WDYT?
>>>
>>> I can do this if nobody else has time, whereas let me know, so that we
>>> don't do the same thing twice :-)
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>> Am 01.05.21 um 17:53 schrieb Tomoko Uchida:
>>>
>>> I just partially backported the patch on the 8x branch. See the issue
>>> how it looks like.
>>> https://issues.apache.org/jira/browse/LUCENE-9947
>>>
>>> > Should I make a PR? If we use the markdown files from the old GitHub
>>> Location, we can easily publish them as part of official documentation.
>>>
>>> Thank you Uwe. But I don't think it is very much meaningful to just copy
>>> the old repository's README or docs to our site. They are almost filled
>>> with its historical notes and outdated information.
>>>
>>> Tomoko
>>>
>>>
>>> 2021年5月2日(日) 0:38 Uwe Schindler :
>>>
>>>> Should I make a PR? If we use the markdown files from the old GitHub
>>>> Location, we can easily publish them as part of official documentation. The
>>>> correct module is below lucene/documentation.
>>>>
>>>> Uwe
>>>>
>>>> Am May 1, 2021 3:10:22 PM UTC schrieb Tomoko Uchida <
>>>> tomoko.uchida.1...@gmail.com>:
>>>>>
>>>>> > Do you think it would still make sense to add some documentation as
>>>>> well to
>>>>> > https://github.com/apache/lucene/tree/main/lucene/luke
>>>>>
>>>>> Yes, my PR is orthogonal to the suggestion.
>>>>>
>>>>> Just to make things a little clear, the README file under the module
>>>>> directory won't be published as the part of the official documentation
>>>>> (Lucene website).
>>>>> Related to Uwe's suggestion, we have an issue for Luke documentation.
>>>>> If you're really interested in making changes on official documentation as
>>>>> you firstly mentioned, feel free to involve it; but I think it would need 
>>>>> a
>>>>> bit of work.
>>>>> https://issues.apache.org/jira/browse/LUCENE-9459
>>>>>
>>>>> Tomoko
>>>>>
>>>>>
>>>>> 2021年5月1日(土) 23:42 Michael Wechner :
>>>>>
>>>>>> ah ok :-)
>>>>>>
>>>>>> Do you have an idea when 9.0 will be released approximately?
>>>>>>
>>>>>> Do you think it wou

Re: Moving usage documentation of Luke to Lucene Website

2021-05-03 Thread Dmitry Kan

t;>> 2021年5月1日(土) 23:04 Michael Wechner :
>>>>>
>>>>>> I am lIttle confused now :-) where exactly are these changes now?
>>>>>>
>>>>>> If I understand correctly that PR has already been merged (
>>>>>> https://github.com/apache/lucene/pull/120), but I cannot see the
>>>>>> changes at
>>>>>>
>>>>>> https://lucene.apache.org/core/8_8_2/index.html
>>>>>> https://lucene.apache.org/core/8_8_2/luke/index.html
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>> Am 01.05.21 um 11:15 schrieb Tomoko Uchida:
>>>>>>
>>>>>> Just FYI, we're going to remove Javadocs for luke module from the
>>>>>> documentation site (it does not much make sense publishing API docs for 
>>>>>> the
>>>>>> GUI tool anyway).
>>>>>> Instead, I've added a simple description about how to launch the app.
>>>>>> https://github.com/apache/lucene/pull/120
>>>>>>
>>>>>> Tomoko
>>>>>>
>>>>>>
>>>>>> 2021年5月1日(土) 8:15 Robert Muir :
>>>>>>
>>>>>>> please submit a PR if you have the time!
>>>>>>>
>>>>>>> On Fri, Apr 30, 2021 at 3:38 PM Michael Wechner
>>>>>>>  wrote:
>>>>>>> >
>>>>>>> > sounds good to me :-)
>>>>>>> >
>>>>>>> > Would you like me to do a pull request or will you copy the
>>>>>>> README.md
>>>>>>> > yourself?
>>>>>>> >
>>>>>>> > Thanks
>>>>>>> >
>>>>>>> > Michael
>>>>>>> >
>>>>>>> > Am 30.04.21 um 16:59 schrieb Robert Muir:
>>>>>>> > > I'll throw out the idea of moving the README.md to
>>>>>>> > > https://github.com/apache/lucene/tree/main/lucene/luke
>>>>>>> > >
>>>>>>> > > That way it is in the main source tree, versioned, and pushed as
>>>>>>> part
>>>>>>> > > of the release process. We could then fix the gradle
>>>>>>> documentation
>>>>>>> > > task to incorporate any module's README.md into the main release
>>>>>>> > > documentation file (such as
>>>>>>> > > https://lucene.apache.org/core/8_8_2/index.html). Also for now
>>>>>>> at
>>>>>>> > > least, it would be visible from github too.
>>>>>>> > >
>>>>>>> > > On Fri, Apr 30, 2021 at 10:32 AM Jan Høydahl <
>>>>>>> jan@cominvent.com> wrote:
>>>>>>> > >> Hi
>>>>>>> > >>
>>>>>>> > >> That could be feasible. Have a look at the website git repo at
>>>>>>> https://github.com/apache/lucene-site.
>>>>>>> > >> Should be fairly easy to move the Markdown files from github to
>>>>>>> this site, which also accepts Markdown...
>>>>>>> > >>
>>>>>>> > >> Jan
>>>>>>> > >>
>>>>>>> > >>> 30. apr. 2021 kl. 08:41 skrev Michael Wechner <
>>>>>>> michael.wech...@wyona.com>:
>>>>>>> > >>>
>>>>>>> > >>> Hi
>>>>>>> > >>>
>>>>>>> > >>> I just noticed that Luke became a Lucene Module
>>>>>>> > >>>
>>>>>>> > >>> https://github.com/DmitryKey/luke#important-notice
>>>>>>> > >>>
>>>>>>> https://mocobeta.medium.com/luke-become-an-apache-lucene-module-as-of-lucene-8-1-7d139c998b2
>>>>>>> > >>> https://lucene.apache.org/core/8_8_2/luke/index.html
>>>>>>> > >>>
>>>>>>> > >>> Wouldn't it make sense to move the "Usage documentation of
>>>>>>> Luke" from https://github.com/DmitryKey/luke#luke to somewhere
>>>>>>> inside lucene.apache.org?
>>>>>>> > >>>
>>>>>>> > >>> Thanks
>>>>>>> > >>>
>>>>>>> > >>> Michael
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> -
>>>>>>> > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> > >>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> > >>>
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> -
>>>>>>> > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> > >> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> > >>
>>>>>>> > >
>>>>>>> -
>>>>>>> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> > > For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>> https://www.thetaphi.de
>>>
>>
>>

-- 
-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: https://dmitry-kan.medium.com/
Twitter: http://twitter.com/dmitrykan

Re: Questions about the new vector API

2021-04-02 Thread Dmitry Kan

Hi Michael,

No worries on misspelling -- living abroad I see it happen frequently, and
frankly got used to it!

The main reason for asking which algos you've used is that some of them
have hyper-parameters that can be exposed to users in order for them to
decide the recall / qps tradeoff. Have you considered this? In particular,
nmslib offers such capability:
https://opendistro.github.io/for-elasticsearch-docs/docs/knn/settings/#index-settings
(nmslib was implemented in Open Distro Elasticsearch).

Also, if you are interested, please take a look at
https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455
which studies impact of an KNN algorithm on indexing and querying
speeds, as well as disk size usage. Will be grateful for your feedback.

Is there documentation for how to use LUCENE-9004?

Best,
Dmitry

On Sun, 28 Mar 2021 at 13:51, Michael Sokolov  wrote:

> Ugh sorry for misspelling your name, I blame the phone!
>
> On Sun, Mar 28, 2021, 6:50 AM Michael Sokolov  wrote:
>
>> Hi Dimitry, I worked initially from the papers cited in LUCENE-9004,
>> which I think is also what Tomoko was doing. Later I did refer to nmslib
>> too.
>>
>> On Sat, Mar 27, 2021, 6:01 AM Dmitry Kan  wrote:
>>
>>> Michael,
>>>
>>> I got some interest in this area and have been doing comparative study
>>> of different KNN implementations and blogging about it.
>>>
>>> Did you use nmslib for HNSW implementation or something else?
>>>
>>> On Tue, 16 Mar 2021 at 22:47, Michael Sokolov 
>>> wrote:
>>>
>>>> Yeah, HNSW is problematic in a few ways: (1) merging is costly due to
>>>> the need to completely recreate the graph. (2) searching across a
>>>> segmented index sacrifices much of the performance benefit of HNSW
>>>> since the cost of searching HNSW graphs scales ~logarithmically with
>>>> the size of the graph, so splitting into multiple graphs and then
>>>> merge sorting results is pretty expensive. I guess the random access /
>>>> scan forward dynamic is another problematic area.
>>>>
>>>> On Tue, Mar 16, 2021 at 1:28 PM Robert Muir  wrote:
>>>> >
>>>> > Maybe that is so, but we should factor in everything: such as large
>>>> scale indexing, not requiring whole data set to be in RAM, etc. Hey, it's
>>>> Lucene!
>>>> >
>>>> > Because HNSW has dominated the nightly benchmarks, I have been
>>>> digging through stacktraces and trying to figure out ways to make it work
>>>> efficiently, and I'm not sure what to do.
>>>> > Especially merge is painful: it seems to cause a storm of page
>>>> faults/random accesses due to how it works, and I don't know yet how to
>>>> make it better.
>>>> > It seems to rebuild the entire graph, spraying random accesses across
>>>> a "slow-wrapper" that binary searches each sub on every access.
>>>> > I don't see any way to even amortize the pain with some kind of bulk
>>>> merge trick.
>>>> >
>>>> > So if we find algorithms that scale better, I think we should lend a
>>>> preference towards them. For example, algorithms that allow
>>>> per-segment/sequential index and merge.
>>>> >
>>>> > On Tue, Mar 16, 2021 at 1:06 PM Michael Sokolov 
>>>> wrote:
>>>> >>
>>>> >> ann-benchmarks.com maintains open benchmarks of a bunch of ANN
>>>> >> (approximate NN) algorithms. When we started this effort, HNSW was at
>>>> >> the top of the heap in most of the benchmarks.
>>>> >>
>>>> >> On Tue, Mar 16, 2021 at 12:28 PM Robert Muir 
>>>> wrote:
>>>> >> >
>>>> >> > Where are the alternative algorithms that work on sequential
>>>> iterators and don't need random access?
>>>> >> >
>>>> >> > Seems like these should be the ones we initially add to lucene,
>>>> and HNSW should be put aside for now? (is it a toy, or can we do it without
>>>> jazillions of random accesses?)
>>>> >> >
>>>> >> > On Tue, Mar 16, 2021 at 12:15 PM Michael Sokolov <
>>>> msoko...@gmail.com> wrote:
>>>> >> >>
>>>> >> >> There's also some good discussion on
>>>> >> >> https://issues.apache.org/jira/browse/LUCENE-9583 about random
>>>> access
>>>> >> >

Re: Questions about the new vector API

2021-03-27 Thread Dmitry Kan

ways
> >> >> > > require an additional decoding step. I can see that the naming is
> >> >> > > confusing there. The intent is that you index the vector values,
> but
> >> >> > > no additional indexing data structure. Also: the reason HNSW is
> >> >> > > mentioned in these SearchStrategy enums is to make room for other
> >> >> > > vector indexing approaches, like LSH. There was a lot of
> discussion
> >> >> > > that we wanted an API that allowed for experimenting with other
> >> >> > > techniques for indexing and searching vector values.
> >> >> > >
> >> >> > > Adrien, you made an analogy to PerFieldPostingsFormat (and
> DocValues),
> >> >> > > but I think the situation is more akin to Points, where we have
> the
> >> >> > > options on IndexableField. The metadata we store there
> (dimension and
> >> >> > > score function) don't really result in different formats, ie code
> >> >> > > paths for indexing and storage; they are more like parameters to
> the
> >> >> > > format, in my mind. Perhaps the situation will look different
> when we
> >> >> > > get our second vector indexing strategy (like LSH).
> >> >> > >
> >> >> > >
> >> >> > > On Tue, Mar 16, 2021 at 10:19 AM Tomoko Uchida
> >> >> > >  wrote:
> >> >> > > >
> >> >> > > > > Should we rename VectorFormat to VectorsFormat? This would
> be more consistent with other file formats that use the plural, like
> PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
> >> >> > > >
> >> >> > > > +1 for using plural form for consistency - if we reconsider
> the names, how about VectorValuesFormat so that it follows the naming
> convention for XXXValues?
> >> >> > > >
> >> >> > > > DocValuesFormat / DocValues
> >> >> > > > PointValuesFormat / PointValues
> >> >> > > > VectorValuesFormat / VectorValues (currently, VectorFormat /
> VectorValues)
> >> >> > > >
> >> >> > > > > Should SearchStrategy constants avoid explicit references to
> HNSW?
> >> >> > > >
> >> >> > > > Also +1 for decoupling HNSW specific implementations from
> general vectors, though I am not fully sure if we can strictly separate the
> similarity metrics and search algorithms for vectors.
> >> >> > > > LUCENE-9322 (unified vectors API) was resolved months ago,
> does it achieve its goal? I haven't followed the issue in months because of
> my laziness...
> >> >> > > >
> >> >> > > > Thanks,
> >> >> > > > Tomoko
> >> >> > > >
> >> >> > > >
> >> >> > > > 2021年3月16日(火) 19:32 Adrien Grand :
> >> >> > > >>
> >> >> > > >> Hello,
> >> >> > > >>
> >> >> > > >> I've tried to catch up on the vector API and I have the
> following questions. I've tried to read through discussions on JIRA first
> in case it had been covered, but it's possible I missed some relevant ones.
> >> >> > > >>
> >> >> > > >> Should VectorValues#search be on VectorReader instead? It
> felt a bit odd to me to have the search logic on the iterator.
> >> >> > > >>
> >> >> > > >> Do we need SearchStrategy.NONE? Documentation suggests that
> it allows storing vectors but that NN search won't be supported. This looks
> like a use-case for binary doc values to me? It also slightly caught me by
> surprise due to the inconsistency with IndexOptions.NONE, which means "do
> not index this field" (and likewise for DocValuesType.NONE), so I first
> assumed that SearchStrategy.NONE also meant "do not index this field as a
> vector".
> >> >> > > >>
> >> >> > > >> While postings and doc-value formats allow per-field
> configuration via PerFieldPostingsFormat/PerFieldDocValuesFormat, vectors
> use a different mechanism where VectorField#createHnswType sets attributes
> on the field type that the vectors writer then reads. Should we have a
> PerFieldVectorsFormat instead and configure these options via the vectors
> format?
> >> >> > > >>
> >> >> > > >> Should SearchStrategy constants avoid explicit references to
> HNSW? The rest of the API seems to try to be agnostic of the way that NN
> search is implemented. Could we make SearchStrategy only about the
> similarity metric that is used for vectors? This particular point seems
> discussed on LUCENE-9322 but I couldn't find the conclusion.
> >> >> > > >>
> >> >> > > >> Should we rename VectorFormat to VectorsFormat? This would be
> more consistent with other file formats that use the plural, like
> PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
> >> >> > > >>
> >> >> > > >> --
> >> >> > > >> Adrien
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: Questions about corrupted Segments files.

2019-11-06 Thread Dmitry Kan

Hi Kaya,

Try luke:
http://dmitrykan.blogspot.com/2018/01/new-luke-on-javafx.html

Best,

Dmitry

On Wed 6. Nov 2019 at 3.24, Kayak28  wrote:

> Hello, Community members:
>
> I am using Solr 7.7.2.
> On the other day, while indexing to the Solr, my computer powered off.
> As a result, there are corrupted segment files.
>
> Is there any way to fix the corrupted segment files without re-indexing?
>
> I have read a blog post (in Japanese) writing about checkIndex method
> which can be used to determine/fix corrupted segment files, but when I
> tried to run the following command, I got the error message.
> So, I am not sure if checkIndex can actually fix the index files.
>
>
> java -cp lucene-core-7.7.2.jar -ea:org.apache.lucene...
> org.apache.lucene.index.CheckIndex solr/server/solr/basic_copy/data/index
> -fix
>
>
> ERROR: unexpected extra argument '-fix'
>
>
>
> If anybody knows about either a way to fix corrupted segment files or a
> way to use checkIndex '-fix' option correctly, could you please let me
> know?
>
> Any clue will be very appreciated.
>
> Sincerely,
> Kaya Ota
>
>
>
-- 
-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: Welcome Tomoko Uchida as Lucene/Solr committer

2019-04-08 Thread Dmitry Kan

Congratulations, Tomoko! Well deserved!

And I’m also happy to have helped by creating the github repo
https://github.com/DmitryKey/luke
(in fact a fork) back in 2013 after getting the ’blessing’ from luke’s
author Andrzej Bialecki, and working on first releases.

Looking forward to run luke as part of Lucene!

Dmitry Kan

On Mon, 8 Apr 2019 at 23.01, Robert Muir  wrote:

> Welcome!
>
> On Mon, Apr 8, 2019 at 11:21 AM Uwe Schindler  wrote:
> >
> > Hi all,
> >
> > Please join me in welcoming Tomoko Uchida as the latest Lucene/Solr
> committer!
> >
> > She has been working on
> https://issues.apache.org/jira/browse/LUCENE-2562 for several years with
> awesome progress and finally we got the fantastic Luke as a branch on ASF
> JIRA:
> https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/jira/lucene-2562-luke-swing-3
> > Looking forward to the first release of Apache Lucene 8.1 with Luke
> bundled in the distribution. I will take care of merging it to master and
> 8.x branches together with her once she got the ASF account.
> >
> > Tomoko also helped with the Japanese and Korean Analyzers.
> >
> > Congratulations and Welcome, Tomoko! Tomoko, it's traditional for you to
> introduce yourself with a brief bio.
> >
> > Uwe & Robert (who nominated Tomoko)
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -----
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> --
-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

[jira] [Commented] (LUCENE-2562) Make Luke a Lucene/Solr Module

2018-08-01 Thread Dmitry Kan (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565703#comment-16565703
 ] 

Dmitry Kan commented on LUCENE-2562:


[~arafalov] thanks for your input! Can you please elaborate on 'If Luke is 
supposed to be part of Lucene-only distribution, I guess the discussion is a 
bit more complicated' ?

> Make Luke a Lucene/Solr Module
> --
>
> Key: LUCENE-2562
> URL: https://issues.apache.org/jira/browse/LUCENE-2562
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mark Miller
>Priority: Major
>  Labels: gsoc2014
> Attachments: LUCENE-2562-Ivy.patch, LUCENE-2562-Ivy.patch, 
> LUCENE-2562-Ivy.patch, LUCENE-2562-ivy.patch, LUCENE-2562.patch, 
> LUCENE-2562.patch, Luke-ALE-1.png, Luke-ALE-2.png, Luke-ALE-3.png, 
> Luke-ALE-4.png, Luke-ALE-5.png, luke-javafx1.png, luke-javafx2.png, 
> luke-javafx3.png, luke1.jpg, luke2.jpg, luke3.jpg, lukeALE-documents.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> see
> "RE: Luke - in need of maintainer": 
> http://markmail.org/message/m4gsto7giltvrpuf
> "Web-based Luke": http://markmail.org/message/4xwps7p7ifltme5q
> I think it would be great if there was a version of Luke that always worked 
> with trunk - and it would also be great if it was easier to match Luke jars 
> with Lucene versions.
> While I'd like to get GWT Luke into the mix as well, I think the easiest 
> starting point is to straight port Luke to another UI toolkit before 
> abstracting out DTO objects that both GWT Luke and Pivot Luke could share.
> I've started slowly converting Luke's use of thinlet to Apache Pivot. I 
> haven't/don't have a lot of time for this at the moment, but I've plugged 
> away here and there over the past work or two. There is still a *lot* to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2562) Make Luke a Lucene/Solr Module

2018-07-21 Thread Dmitry Kan (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551676#comment-16551676
 ] 

Dmitry Kan commented on LUCENE-2562:


Hi [~steve_rowe] thanks for your support with filing the ticket. Looking to 
solve this one way or another.

Thanks [~Tomoko Uchida] for your contribution and research so far!

> Make Luke a Lucene/Solr Module
> --
>
> Key: LUCENE-2562
> URL: https://issues.apache.org/jira/browse/LUCENE-2562
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mark Miller
>Priority: Major
>  Labels: gsoc2014
> Attachments: LUCENE-2562-Ivy.patch, LUCENE-2562-Ivy.patch, 
> LUCENE-2562-Ivy.patch, LUCENE-2562-ivy.patch, LUCENE-2562.patch, 
> LUCENE-2562.patch, Luke-ALE-1.png, Luke-ALE-2.png, Luke-ALE-3.png, 
> Luke-ALE-4.png, Luke-ALE-5.png, luke-javafx1.png, luke-javafx2.png, 
> luke-javafx3.png, luke1.jpg, luke2.jpg, luke3.jpg, lukeALE-documents.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> see
> "RE: Luke - in need of maintainer": 
> http://markmail.org/message/m4gsto7giltvrpuf
> "Web-based Luke": http://markmail.org/message/4xwps7p7ifltme5q
> I think it would be great if there was a version of Luke that always worked 
> with trunk - and it would also be great if it was easier to match Luke jars 
> with Lucene versions.
> While I'd like to get GWT Luke into the mix as well, I think the easiest 
> starting point is to straight port Luke to another UI toolkit before 
> abstracting out DTO objects that both GWT Luke and Pivot Luke could share.
> I've started slowly converting Luke's use of thinlet to Apache Pivot. I 
> haven't/don't have a lot of time for this at the moment, but I've plugged 
> away here and there over the past work or two. There is still a *lot* to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: GSOC2017: Call to Solr and Tika/Nutch/Camel/NiFi/Zeppelin/etc mentors

2017-03-26 Thread Dmitry Kan

Hi Alexandre,

Forwarded your call to luke's google group:
https://groups.google.com/forum/#!topic/luke-discuss/rmZo7R3gDdc

There might be a potential for solr/lucene/luke projects, like adding
capability to open solr/lucene index in luke from a remote server:
https://github.com/DmitryKey/luke/issues/68

Good luck with SOC!

Regards,
Dmitry
-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

On 17 March 2017 at 16:25, Alexandre Rafalovitch  wrote:

> I am mentoring in this year's Google Summer of Code (first timer!). I
> know there is a couple of us from Solr project, but I also noticed
> some mentors from upstream/downstream projects.
>
> I am proposing that we put at least a couple of integration projects
> to improve/upgrade Solr integration with both upstream (Tika) and
> downstream (Nutch/Camel/NiFi) projects. And maybe even propose some
> new projects, such as Solr backend engine for Zeppelin (so we could
> have a Python Notebook-like interface to Solr, not just via JDBC
> bridge, we do already).
>
> I am not sure whose JIRAs they should go into, but the project idea
> tag spans all ASF projects, so it is more important to figure out
> mentor-level agreement first.
>
> In any case, to push this forward, if we get several students all
> working on Solr, I could run a Solr bootcamp class for students,
> mentors, and whoever else in the sister communities wants to
> participate and get more familiar with Solr. We could also run a
> parallel mini-list (or Gitter room or whatever) where multiple new
> implementors of Solr integrations can hang out together and progress
> together.
>
>
> Regards,
>Alex.
> P.s. I am also working on redoing Solr examples (starting from DIH
> ones at: SOLR-10311). If anybody has comments on what kind of examples
> would make integrations easier, I am very receptive.
> P.p.s. Feel free to forward this to the other mailing lists for other
> relevant sister Apache communities.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Commented] (SOLR-10231) Cursor value always different for last page with sorting by a date based function using NOW

2017-03-13 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907528#comment-15907528
 ] 

Dmitry Kan commented on SOLR-10231:
---

[~hossman] thanks for clarifying and suggestions. Going to test the fixed 
timestamp value for the NOW param. In the meantime we falled back to non-cursor 
pagination method. Btw, would the same issue exist in 6.x?

> Cursor value always different for last page with sorting by a date based 
> function using NOW
> ---
>
> Key: SOLR-10231
> URL: https://issues.apache.org/jira/browse/SOLR-10231
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 4.10.2
>Reporter: Dmitry Kan
>
> Cursor based results fetching is a deal breaker for search performance.
> It works extremely well when paging using sort by field(s).
> Example, that works (Id is unique field in the schema):
> Query:
> {code}
> http://solr-host:8983/solr/documents/select?q=*:*&fq=DocumentId:76581059&cursorMark=AoIGAC5TU1ItNzY1ODEwNTktMQ==&fl=DocumentId&sort=UserId+asc%2CId+desc&rows=1
> {code}
> Response:
> {code}
> 
> 
> 0
> 4
> 
> *:*
> DocumentId
> AoIGAC5TU1ItNzY1ODEwNTktMQ==
> DocumentId:76581059
> UserId asc,Id desc
> 1
> 
> 
> 
> AoIGAC5TU1ItNzY1ODEwNTktMQ==
> 
> {code}
> nextCursorMark equals to cursorMark and so we know this is last page.
> However, sorting by function behaves differently:
> Query:
> {code}
> http://solr-host:8983/solr/documents/select?rows=1&q=*:*&fq=DocumentId:76581059&cursorMark=AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE=&fl=DocumentId&sort=min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))%20asc,Id%20desc
> {code}
> Response:
> {code}
> 
> 
> 0
> 6
> 
> *:*
> DocumentId
> AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE=
> DocumentId:76581059
> 
> min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))
>  asc,Id desc
> 
> 1
> 
> 
> 
> 
> 76581059
> 
> 
> AoIFQf9yFyAuU1NSLTc2NTgxMDU5LTE=
> 
> {code}
> nextCursorMark does not equal to cursorMark, which suggests there are more 
> results. Which is not true (numFound=1). And so the client goes into infinite 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10231) Cursor value always different for last page with sorting by function

2017-03-05 Thread Dmitry Kan (JIRA)

Dmitry Kan created SOLR-10231:
-

 Summary: Cursor value always different for last page with sorting 
by function
 Key: SOLR-10231
 URL: https://issues.apache.org/jira/browse/SOLR-10231
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SearchComponents - other
Affects Versions: 4.10.2
Reporter: Dmitry Kan


Cursor based results fetching is a deal breaker for search performance.
It works extremely well when paging using sort by field(s).

Example, that works (Id is unique field in the schema):
Query:
{code}
http://solr-host:8983/solr/documents/select?q=*:*&fq=DocumentId:76581059&cursorMark=AoIGAC5TU1ItNzY1ODEwNTktMQ==&fl=DocumentId&sort=UserId+asc%2CId+desc&rows=1
{code}
Response:
{code}


0
4

*:*
DocumentId
AoIGAC5TU1ItNzY1ODEwNTktMQ==
DocumentId:76581059
UserId asc,Id desc
1



AoIGAC5TU1ItNzY1ODEwNTktMQ==

{code}

nextCursorMark equals to cursorMark and so we know this is last page.

However, sorting by function behaves differently:
Query:
{code}
http://solr-host:8983/solr/documents/select?rows=1&q=*:*&fq=DocumentId:76581059&cursorMark=AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE=&fl=DocumentId&sort=min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))%20asc,Id%20desc
{code}
Response:
{code}


0
6

*:*
DocumentId
AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE=
DocumentId:76581059

min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))
 asc,Id desc

1




76581059


AoIFQf9yFyAuU1NSLTc2NTgxMDU5LTE=

{code}

nextCursorMark does not equal to cursorMark, which suggests there are more 
results. Which is not true (numFound=1). And so the client goes into infinite 
loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Toke Eskildsen as a Lucene/Solr committer

2017-02-16 Thread Dmitry Kan

Hi Toke, congrats! Glad for you and well deserved!

P.S. Was awesome to test faceting module speed ups you did for high
cardinality fields. Your skill to explain complex things very efficiently
is unmatched.

Dmitry
-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

On 14 February 2017 at 20:11, Toke Eskildsen  wrote:

> Thank you for the invitation and the warm welcome.
>
>
> I am a 43 year old Danish man, with a family and a job at the Royal Danish
> Library, where I have been working mostly with search-related technology
> for 10 years.
>
> I have done a fair bit of Lucene/Solr hacking during the years, with focus
> on speed- and memory-optimizations. Implementing bit-packing structures,
> eliminating steps in calculations and in general making more things
> possible on less hardware is a bit of an obsession. I hope to continue in
> that direction as a committer and am looking forward to a more controlled
> and community-oriented way of writing code: The one-man-show is a lot of
> fun and can work well for specific use cases, but it tends to get a bit out
> of control and the result might not be that usable elsewhere.
>
> Happy to be here,
> Toke
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [Result Query Solr] How to retrieve the content of pdfs

2016-09-20 Thread Dmitry Kan

Hi Alexandre,

Could you add fl=* to your query and check the output? Alternatively, have
a look at your schema file and check what could look like content field:
text or similar.

Dmitry

14 сент. 2016 г. 1:27 AM пользователь "Alexandre Martins" <
alexandremart...@gmail.com> написал:

> Hi Guys,
>
> I'm trying to use the last version of solr and i have used the post tool
> to upload 28 pdf files and it works fine. However, I don't know how to show
> the content of the files in the resulted json. Anybody know how to include
> this field?
>
> "responseHeader":{ "zkConnected":true, "status":0, "QTime":43, "params":{
> "q":"ABC", "indent":"on", "wt":"json", "_":"1473804420750"}}, "response":
> {"numFound":40,"start":0,"maxScore":9.1066065,"docs":[ { "id":
> "/home/alexandre/desenvolvimento/workspace/solr-6.2.0/pdfs_hack/abc.pdf",
> "date":["2016-09-13T14:44:17Z"], "pdf_pdfversion":[1.5], "xmp_creatortool
> ":["PDFCreator Version 1.7.3"], "stream_content_type":["application/pdf"],
> "access_permission_modify_annotations":[false], "
> access_permission_can_print_degraded":[false], "dc_creator":["abc"], "
> dcterms_created":["2016-09-13T14:44:17Z"], "last_modified":["2016-09-
> 13T14:44:17Z"], "dcterms_modified":["2016-09-13T14:44:17Z"], 
> "dc_format":["application/pdf;
> version=1.5"], "title":["ABC tittle"], "xmpmm_documentid":["uuid:
> 100ccff2-7c1c-11e6--ab7b62fc46ae"], "last_save_date":["2016-09-
> 13T14:44:17Z"], "access_permission_fill_in_form":[false], "meta_save_date
> ":["2016-09-13T14:44:17Z"], "pdf_encrypted":[false], "dc_title":["Tittle
> abc"], "modified":["2016-09-13T14:44:17Z"], "content_type":["application/
> pdf"], "stream_size":[101948], "x_parsed_by":["org.apache.
> tika.parser.DefaultParser", "org.apache.tika.parser.pdf.PDFParser"], "
> creator":["mauricio.tostes"], "meta_author":["mauricio.tostes"], "
> meta_creation_date":["2016-09-13T14:44:17Z"], "created":["Tue Sep 13
> 14:44:17 UTC 2016"], "access_permission_extract_for_accessibility":[false],
> "access_permission_assemble_document":[false], "xmptpg_npages":[3], "
> creation_date":["2016-09-13T14:44:17Z"], "resourcename":["/home/
> alexandre/desenvolvimento/workspace/solr-6.2.0/pdfs_hack/abc.pdf"], "
> access_permission_extract_content":[false], "access_permission_can_print":
> [false], "author":["abc.add"], "producer":["GPL Ghostscript 9.10"], "
> access_permission_can_modify":[false], "_version_":1545395897488113664},
>
> Alexandre Costa Martins
> DATAPREV - IT Analyst
> Software Reuse Researcher
> MSc Federal University of Pernambuco
> RiSE Member - http://www.rise.com.br
> Sun Certified Programmer for Java 5.0 (SCPJ5.0)
>
> MSN: xandecmart...@hotmail.com
> GTalk: alexandremart...@gmail.com
> Skype: xandecmartins
> Mobile: +55 (85) 9626-3631
>

Re: ways to affect on SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite

2015-11-05 Thread Dmitry Kan

Hi Alan,

Thanks! That is already something. I will take a look at the code.

Dmitry
5 нояб. 2015 г. 11:44 AM пользователь "Alan Woodward" 
написал:

> Hi Dmitry,
>
> This isn't quite as simple as it seems, unfortunately, because
> TopTermsRewrite expects the 'score' for each term to be the same across all
> segments, and that won't be the case with frequencies.
>
> I tried to come up with a solution in LUCENE-6513, but we didn't really
> come to a consensus on how best to do it.  But you could probably take the
> code in there and use it to write your own RewriteMethod.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 5 Nov 2015, at 09:25, Dmitry Kan wrote:
>
> Hello,
>
> Cross-posting the same question from solr mailing list, hopefully with
> better luck.
>
> Are there ways to affect on strategy
> behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ?
>
> As it seems, at the moment, the rewrite method loads max N words that
> maximize term score. How can this be changed to load top terms by
> frequency, for example?
>
> An example is for comp* to load "company", if it was among top N most
> frequent terms in the index. And not less obvious words "comp'd, comp692,
> compacta" etc.
>
> Thanks,
> Dmitry
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>
>
>

ways to affect on SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite

2015-11-05 Thread Dmitry Kan

Hello,

Cross-posting the same question from solr mailing list, hopefully with
better luck.

Are there ways to affect on strategy
behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ?

As it seems, at the moment, the rewrite method loads max N words that
maximize term score. How can this be changed to load top terms by
frequency, for example?

An example is for comp* to load "company", if it was among top N most
frequent terms in the index. And not less obvious words "comp'd, comp692,
compacta" etc.

Thanks,
Dmitry

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: TokenOrderingFilter

2015-06-04 Thread Dmitry Kan

Hi David,

Thanks for your quick reply.

In fact, we do use WDF in 4.10.2. It very much looks as you explain, that
the offsets are preserved in the monotonically increasing order. Here is
the list of filters we use on the indexing side:

solr.MappingCharFilterFactory

solr.StandardTokenizerFactory

solr.StandardFilterFactory

solr.WordDelimiterFilterFactory

solr.LowerCaseFilterFactory

custom filters that do not mingle with the order of the offsets.




On 4 June 2015 at 18:35, david.w.smi...@gmail.com 
wrote:

> Hi Dmitry,
>
> Ideally, the token stream produces tokens that have a startOffset >= the
> startOffset of the previous token from the stream.  Sometime in the past
> year or so, this was enforced at the indexing layer, I think.  There used
> to be TokenFilters that violated this contract; I think earlier versions of
> WordDelimiterFilter could.  If my assumption that this is asserted at the
> indexing layer is correct, then I think TokenOrderingFilter is obsolete.
>
> ~ David
>
> On Thu, Jun 4, 2015 at 7:48 AM Dmitry Kan  wrote:
>
>>Hi guys,
>>
>> Sorry for sending questions to the dev list and not to the user one.
>> Somehow I'm getting more luck here.
>>
>> We have found the class o.a.solr.highlight.TokenOrderingFilter
>> with the following comment:
>>
>>
>> -/**
>>
>>- * Orders Tokens in a window first by their startOffset ascending.
>>
>>- * endOffset is currently ignored.
>>
>>- * This is meant to work around fickleness in the highlighter only.  It
>>
>>- * can mess up token positions and should not be used for indexing or 
>> querying.
>>
>>- */
>>
>>-final class TokenOrderingFilter extends TokenFilter {
>>
>> In fact, removing this class didn't change the behaviour of the highlighter.
>>
>> Could anybody shed light on its necessity?
>>
>> Thanks,
>>
>> Dmitry Kan
>>
>>

TokenOrderingFilter

2015-06-04 Thread Dmitry Kan

Hi guys,

Sorry for sending questions to the dev list and not to the user one.
Somehow I'm getting more luck here.

We have found the class o.a.solr.highlight.TokenOrderingFilter
with the following comment:


-/**

   - * Orders Tokens in a window first by their startOffset ascending.

   - * endOffset is currently ignored.

   - * This is meant to work around fickleness in the highlighter only.  It

   - * can mess up token positions and should not be used for indexing
or querying.

   - */

   -final class TokenOrderingFilter extends TokenFilter {

In fact, removing this class didn't change the behaviour of the highlighter.

Could anybody shed light on its necessity?

Thanks,

Dmitry Kan

Re: Modifying DefaultSolrHighlighter

2015-05-06 Thread Dmitry Kan

Thanks, David.

Will let you know, how it went.

On 5 May 2015 at 20:01, david.w.smi...@gmail.com 
wrote:

> Yes.
>
> On Tue, May 5, 2015 at 8:29 AM Dmitry Kan  wrote:
>
>> Hi David,
>>
>> Thanks for replying so quick! Indeed, the NPE points to SolrCore being
>> null. So of the following two ctors:
>>
>> public DefaultSolrHighlighter() {
>> }
>>
>> public DefaultSolrHighlighter(SolrCore solrCore) {
>>   this.solrCore = solrCore;
>> }
>>
>>
>>
>> should we use the second one?
>>
>> Regards,
>> Dmitry
>>
>> On 5 May 2015 at 15:03, david.w.smi...@gmail.com <
>> david.w.smi...@gmail.com> wrote:
>>
>>> Hi Dmitry,
>>>
>>> I am pretty well versed in the sub-class-ability of
>>> DefaultSolrHighlighter.  Most likely the problem you see is that you are
>>> using the no-arg constructor.  Instead, pass in a SolrCore.  It is called
>>> via reflection.  In 5.2 I removed the no-arg constructor.
>>>
>>> ~ David
>>>
>>> On Tue, May 5, 2015 at 4:24 AM Dmitry Kan 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We need to modify the behaviour of DefaultSolrHighlighter class
>>>> slightly. When we tried to extend the class, Solr prints NPE.
>>>>
>>>> Is there some reason to the NPE when extending the class?
>>>>
>>>> Thanks,
>>>>
>>>> Dmitry Kan
>>>>
>>>
>>

Re: Modifying DefaultSolrHighlighter

2015-05-05 Thread Dmitry Kan

Hi David,

Thanks for replying so quick! Indeed, the NPE points to SolrCore being
null. So of the following two ctors:

public DefaultSolrHighlighter() {
}

public DefaultSolrHighlighter(SolrCore solrCore) {
  this.solrCore = solrCore;
}



should we use the second one?

Regards,
Dmitry

On 5 May 2015 at 15:03, david.w.smi...@gmail.com 
wrote:

> Hi Dmitry,
>
> I am pretty well versed in the sub-class-ability of
> DefaultSolrHighlighter.  Most likely the problem you see is that you are
> using the no-arg constructor.  Instead, pass in a SolrCore.  It is called
> via reflection.  In 5.2 I removed the no-arg constructor.
>
> ~ David
>
> On Tue, May 5, 2015 at 4:24 AM Dmitry Kan  wrote:
>
>> Hi,
>>
>> We need to modify the behaviour of DefaultSolrHighlighter class slightly.
>> When we tried to extend the class, Solr prints NPE.
>>
>> Is there some reason to the NPE when extending the class?
>>
>> Thanks,
>>
>> Dmitry Kan
>>
>

Modifying DefaultSolrHighlighter

2015-05-05 Thread Dmitry Kan

Hi,

We need to modify the behaviour of DefaultSolrHighlighter class slightly.
When we tried to extend the class, Solr prints NPE.

Is there some reason to the NPE when extending the class?

Thanks,

Dmitry Kan

Re: 16K threads used up, Solr 4.10 doing nothing.

2015-04-25 Thread Dmitry Kan

Hi Erick,

Do you know, whether the client used facet.threads, even once?

There's a bug in solr with threaded faceting, that makes use of unlimited
amount of threads.

Regards,
Dmitry
On 24 Apr 2015 3:17 am, "Erick Erickson"  wrote:

> A client had a Solr instance doing absolutely nothing for a month.
> Literally a test system that was idle. When they tried to finally do
> something, they couldn't. That Solr process had over 16K threads
> operating. No indexing, no querying, was going on, nada.
>
> They investigated and found that the Solr couldn't connect to
> Zookeeper and had a zillion threads (well, actually about 16K which
> was their limit) with the stack trace at the end.
>
> Admittedly the client had a weird situation where Solr couldn't talk
> to Zookeeper, and admittedly Solr can't do much if it can't talk to
> ZK.
>
> Even so this seems odd. I'm also a bit worried that if they fix the
> reason Solr couldn't talk to ZK (maybe it's a firewall issue? plug the
> cable back in?) that when all those threads suddenly get to do their
> thing what will happen? Not to mention any effects on other processes.
>
> Anyway, if this is worth a JIRA I can create one if there aren't any
> already.
>
> Solr 4.10
>
> Here's the stack trace:
>
> "main-EventThread" daemon prio=10 tid=0x0a38d000 nid=0xeb51 in
> Object.wait() [0x7e8c15c89000]
>
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>at java.lang.Object.wait(Native Method)
>at
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:215)
> - locked <0x0003c5306fc0> (a
> org.apache.solr.common.cloud.ConnectionManager)
> at
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:138)
> at
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:56)
> at
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>
> Erick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: o.a.l.a.payloads.DelimitedPayloadTokenFilter reset()/close() call missing

2015-04-23 Thread Dmitry Kan

Hi Uwe,

I think it was my mistake in the code: in my lucene analyzer class I have
implemented the following method:

@Override
protected TokenStreamComponents createComponents(String fieldName,
Reader reader) {
Tokenizer tokenizer = new WhitespaceTokenizer(reader);
TokenStream result = new LowerCaseFilter(tokenizer);
result = new DelimitedPayloadTokenFilter(result, '|', encoder);
TokenStreamComponents tokenStreamComponents = new
TokenStreamComponents(new WhitespaceTokenizer(reader), result);
return tokenStreamComponents;
}



It was a mistake to create WhitespaceTokenizer twice. The correct
implementation is:

@Override
protected TokenStreamComponents createComponents(String fieldName,
Reader reader) {
Tokenizer tokenizer = new WhitespaceTokenizer(reader);
TokenStream result = new LowerCaseFilter(tokenizer);
result = new DelimitedPayloadTokenFilter(result, '|', encoder);
TokenStreamComponents tokenStreamComponents = new
TokenStreamComponents(tokenizer, result);
return tokenStreamComponents;
}



Sorry about the noise!



On 23 April 2015 at 17:14, Uwe Schindler  wrote:

> Of course!
>
> Do you have code to reproduce?
>
> Uwe
>
>
> Am 23. April 2015 15:54:06 MESZ, schrieb Dmitry Kan <
> dmitry.luc...@gmail.com>:
>>
>> Hi,
>>
>> In Lucene 4.10.4 the DelimitedPayloadTokenFilter class seems to violate
>> the contract of the TokenStream. Should I raise a jira? Thanks.
>>
>>
>>
>> java.lang.IllegalStateException: TokenStream contract violation:
>> reset()/close() call missing, reset() called multiple times, or subclass
>> does not call super.reset(). Please see Javadocs of TokenStream class for
>> more information about the correct consuming workflow.
>> at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111)
>> at
>> org.apache.lucene.analysis.util.CharacterUtils.readFully(CharacterUtils.java:241)
>> at
>> org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:283)
>> at
>> org.apache.lucene.analysis.util.CharacterUtils.fill(CharacterUtils.java:231)
>> at
>> org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:148)
>> at
>> org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:62)
>> at
>> org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter.incrementToken(DelimitedPayloadTokenFilter.java:55)
>>
>>
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>

o.a.l.a.payloads.DelimitedPayloadTokenFilter reset()/close() call missing

2015-04-23 Thread Dmitry Kan

Hi,

In Lucene 4.10.4 the DelimitedPayloadTokenFilter class seems to violate the
contract of the TokenStream. Should I raise a jira? Thanks.



java.lang.IllegalStateException: TokenStream contract violation:
reset()/close() call missing, reset() called multiple times, or subclass
does not call super.reset(). Please see Javadocs of TokenStream class for
more information about the correct consuming workflow.
at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111)
at
org.apache.lucene.analysis.util.CharacterUtils.readFully(CharacterUtils.java:241)
at
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:283)
at
org.apache.lucene.analysis.util.CharacterUtils.fill(CharacterUtils.java:231)
at
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:148)
at
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:62)
at
org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter.incrementToken(DelimitedPayloadTokenFilter.java:55)

[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

2015-04-09 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487389#comment-14487389
 ] 

Dmitry Kan commented on SOLR-4722:
--

Thanks for the great patch. I confirm it works in solr 4.10.3, although 
recompilation was necessary.

> Highlighter which generates a list of query term position(s) for each item in 
> a list of documents, or returns null if highlighting is disabled.
> ---
>
> Key: SOLR-4722
> URL: https://issues.apache.org/jira/browse/SOLR-4722
> Project: Solr
>  Issue Type: New Feature
>  Components: highlighter
>Affects Versions: 4.3, Trunk
>Reporter: Tricia Jenkins
>Priority: Minor
> Attachments: SOLR-4722.patch, SOLR-4722.patch, 
> solr-positionshighlighter.jar
>
>
> As an alternative to returning snippets, this highlighter provides the (term) 
> position for query matches.  One usecase for this is to reconcile the term 
> position from the Solr index with 'word' coordinates provided by an OCR 
> process.  In this way we are able to 'highlight' an image, like a page from a 
> book or an article from a newspaper, in the locations that match the user's 
> query.
> This is based on the FastVectorHighlighter and requires that termVectors, 
> termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-09-25 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147625#comment-14147625
 ] 

Dmitry Kan commented on SOLR-6152:
--

Ok, I see what you are getting at. I think I like this, sounds useful. This 
jira and what you describe may potentially reuse some code. But these two sound 
like different features to me.

I need to take first stab at this so that there is something material to 
contemplate about. Hoping to get moral support from [~steffkes] too :)



> Pre-populating values into search parameters on the query page of solr admin
> 
>
> Key: SOLR-6152
> URL: https://issues.apache.org/jira/browse/SOLR-6152
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.3.1
>Reporter: Dmitry Kan
> Attachments: prepoluate_query_parameters_query_page.bmp
>
>
> In some use cases, it is highly desirable to be able to pre-populate the 
> query page of solr admin with specific values.
> In particular use case of mine, the solr admin user must pass a date range 
> value without which the query would fail.
> It isn't easy to remember the value format for non-solr experts, so I would 
> like to have a way of hooking that value "example" into the query page.
> See the screenshot attached, where I have inserted the fq parameter with date 
> range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-09-25 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147600#comment-14147600
 ] 

Dmitry Kan commented on SOLR-6152:
--

I believe, the current solr admin already displays this link, right after you 
have executed a query, on atop the result set. Or do you mean smth else?



> Pre-populating values into search parameters on the query page of solr admin
> 
>
> Key: SOLR-6152
> URL: https://issues.apache.org/jira/browse/SOLR-6152
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.3.1
>Reporter: Dmitry Kan
> Attachments: prepoluate_query_parameters_query_page.bmp
>
>
> In some use cases, it is highly desirable to be able to pre-populate the 
> query page of solr admin with specific values.
> In particular use case of mine, the solr admin user must pass a date range 
> value without which the query would fail.
> It isn't easy to remember the value format for non-solr experts, so I would 
> like to have a way of hooking that value "example" into the query page.
> See the screenshot attached, where I have inserted the fq parameter with date 
> range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-09-25 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147585#comment-14147585
 ] 

Dmitry Kan commented on SOLR-6152:
--

I'm ready to work on this, but need some guidance for the feature spec. I.e. 
what would be the most natural way of configuring prepolutated values? Should 
it be a UI feature or could it be a special config entry in solrconfig.xml? 
Thoughts?

> Pre-populating values into search parameters on the query page of solr admin
> 
>
> Key: SOLR-6152
> URL: https://issues.apache.org/jira/browse/SOLR-6152
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.3.1
>Reporter: Dmitry Kan
> Attachments: prepoluate_query_parameters_query_page.bmp
>
>
> In some use cases, it is highly desirable to be able to pre-populate the 
> query page of solr admin with specific values.
> In particular use case of mine, the solr admin user must pass a date range 
> value without which the query would fail.
> It isn't easy to remember the value format for non-solr experts, so I would 
> like to have a way of hooking that value "example" into the query page.
> See the screenshot attached, where I have inserted the fq parameter with date 
> range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: How to traverse automaton with new api?

2014-09-24 Thread Dmitry Kan

Thanks a lot, Mike! I should have checked it first thing!

On 23 September 2014 17:47, Michael McCandless 
wrote:

> Try looking at the sources for Automaton.toDot?  It does a similar
> traversal...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Sep 23, 2014 at 9:50 AM, Dmitry Kan 
> wrote:
> > o.a.l.u.automaton.Automaton api has changed in lucene 4.10
> > (
> https://issues.apache.org/jira/secure/attachment/12651171/LUCENE-5752.patch
> ).
> >
> > Method getNumberedStates() got dropped. class State does not exist
> anymore.
> >
> > In the Automaton api before 4.10 the traversal could be achieved like
> this:
> >
> > // Automaton a;
> > State[] states = a.getNumberedStates();
> > for (State s : states) {
> >   StringBuilder msg = new StringBuilder();
> >   msg.append(String.valueOf(s.getNumber()));
> >   if (a.getInitialState() == s) {
> > msg.append(" INITIAL");
> >   }
> >   msg.append(s.isAccept() ? " [accept]" : " [reject]");
> >   msg.append(", " + s.numTransitions + " transitions");
> >   for (Transition t : s.getTransitions()) {
> > // do something with transitions
> >   }
> >   log.info(msg);
> > }
> >
> > Can anybody help on how to traverse an existing Automaton object with new
> > api?
> >
> > Thanks,
> > Dmitry
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

How to traverse automaton with new api?

2014-09-23 Thread Dmitry Kan

o.a.l.u.automaton.Automaton api has changed in lucene 4.10 (
https://issues.apache.org/jira/secure/attachment/12651171/LUCENE-5752.patch
).

Method getNumberedStates() got dropped. class State does not exist anymore.

In the Automaton api before 4.10 the traversal could be achieved like this:

// Automaton a;
State[] states = a.getNumberedStates();
for (State s : states) {
  StringBuilder msg = new StringBuilder();
  msg.append(String.valueOf(s.getNumber()));
  if (a.getInitialState() == s) {
msg.append(" INITIAL");
  }
  msg.append(s.isAccept() ? " [accept]" : " [reject]");
  msg.append(", " + s.numTransitions + " transitions");
  for (Transition t : s.getTransitions()) {
// do something with transitions
  }
  log.info(msg);
}

Can anybody help on how to traverse an existing Automaton object with new
api?

Thanks,
Dmitry

[jira] [Updated] (SOLR-5178) Admin UI - Memory Graph on Dashboard shows NaN for unused Swap

2014-08-12 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5178:
-

Attachment: SOLR-5178.patch

a patch for solr 4.6.0. It adds a check for when both free swap and total swap 
are 0 (dividing one by another will give NaN).

> Admin UI - Memory Graph on Dashboard shows NaN for unused Swap
> --
>
> Key: SOLR-5178
> URL: https://issues.apache.org/jira/browse/SOLR-5178
> Project: Solr
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 4.3, 4.4
>Reporter: Stefan Matheis (steffkes)
>Assignee: Stefan Matheis (steffkes)
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5178.patch, screenshot-vladimir.jpeg
>
>
> If the System doesn't use Swap, the displayed memory graph on the dashboard 
> shows {{NaN}} (not a number) because it tries to divide by zero.
> {code}"system":{
>   "name":"Linux",
>   "version":"3.2.0-39-virtual",
>   "arch":"amd64",
>   "systemLoadAverage":3.38,
>   "committedVirtualMemorySize":32454287360,
>   "freePhysicalMemorySize":912945152,
>   "freeSwapSpaceSize":0,
>   "processCpuTime":5627465000,
>   "totalPhysicalMemorySize":71881908224,
>   "totalSwapSpaceSize":0,
>   "openFileDescriptorCount":350,
>   "maxFileDescriptorCount":4096,
>   "uname": "Linux ip-xxx-xxx-xxx-xxx 3.2.0-39-virtual #62-Ubuntu SMP Thu 
> Feb 28 00:48:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux\n",
>   "uptime":" 11:24:39 up 4 days, 23:03, 1 user, load average: 3.38, 3.10, 
> 2.95\n"
> }{code}
> We should add an additional check for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3585) processing updates in multiple threads

2014-06-11 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028325#comment-14028325
 ] 

Dmitry Kan commented on SOLR-3585:
--

I would agree with [~dsmiley]. Every good api (and to some extent Solr is an 
api in the client view) takes advantage of multi-threading by itself. In this 
case a client can be as thin as possible and not care about threads. And if 
client has enough idle cpus, sure, it could post in parallel. For example, we 
run solr on pretty beefy machines with lots of cpu cores and most of the time 
those are idling.

Some of the latest findings of ours with soft commits and high posting pressure 
show, that posting may sometimes even fail and failed docs re-posting fixes the 
issue.

> processing updates in multiple threads
> --
>
> Key: SOLR-3585
> URL: https://issues.apache.org/jira/browse/SOLR-3585
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0-ALPHA, 5.0
>Reporter: Mikhail Khludnev
> Attachments: SOLR-3585.patch, SOLR-3585.patch, multithreadupd.patch, 
> report.tar.gz
>
>
> Hello,
> I'd like to contribute update processor which forks many threads which 
> concurrently process the stream of commands. It may be beneficial for users 
> who streams many docs through single request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-06-09 Thread Dmitry Kan (JIRA)

Dmitry Kan created SOLR-6152:


 Summary: Pre-populating values into search parameters on the query 
page of solr admin
 Key: SOLR-6152
 URL: https://issues.apache.org/jira/browse/SOLR-6152
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: prepoluate_query_parameters_query_page.bmp

In some use cases, it is highly desirable to be able to pre-populate the query 
page of solr admin with specific values.

In particular use case of mine, the solr admin user must pass a date range 
value without which the query would fail.

It isn't easy to remember the value format for non-solr experts, so I would 
like to have a way of hooking that value "example" into the query page.

See the screenshot attached, where I have inserted the fq parameter with date 
range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-06-09 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-6152:
-

Attachment: prepoluate_query_parameters_query_page.bmp

screenshot of query page

> Pre-populating values into search parameters on the query page of solr admin
> 
>
> Key: SOLR-6152
> URL: https://issues.apache.org/jira/browse/SOLR-6152
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.3.1
>    Reporter: Dmitry Kan
> Attachments: prepoluate_query_parameters_query_page.bmp
>
>
> In some use cases, it is highly desirable to be able to pre-populate the 
> query page of solr admin with specific values.
> In particular use case of mine, the solr admin user must pass a date range 
> value without which the query would fail.
> It isn't easy to remember the value format for non-solr experts, so I would 
> like to have a way of hooking that value "example" into the query page.
> See the screenshot attached, where I have inserted the fq parameter with date 
> range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-23 Thread Dmitry Kan

460)
   [junit4]> at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
   [junit4]> at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
   [junit4]> at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
   [junit4]> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
   [junit4]> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   [junit4]> at java.lang.Thread.run(Thread.java:745)
   [junit4]> at
__randomizedtesting.SeedInfo.seed([4E981035AE883718]:0)Throwable #2:
com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie
threads that couldn't be terminated:
   [junit4]>1) Thread[id=17, name=IPC Parameter Sending Thread #0,
state=TIMED_WAITING, group=TGRP-MorphlineMapperTest]
   [junit4]> at sun.misc.Unsafe.park(Native Method)
   [junit4]> at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
   [junit4]> at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
   [junit4]> at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
   [junit4]> at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
   [junit4]> at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
   [junit4]> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
   [junit4]> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   [junit4]> at java.lang.Thread.run(Thread.java:745)
   [junit4]> at __randomizedtesting.SeedInfo.seed([4E981035AE883718]:0)
   [junit4] Completed on J3 in 31.20s, 1 test, 2 errors <<< FAILURES!
   [junit4]
   [junit4]
   [junit4] Tests with failures:
   [junit4]   - org.apache.solr.hadoop.MorphlineMapperTest (suite)

Dmitry Kan

On 23 April 2014 12:08, Michael McCandless wrote:

> +1
>
> SUCCESS! [0:44:41.170815]
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Apr 22, 2014 at 2:47 PM, Uwe Schindler  wrote:
> > Hi,
> >
> > I prepared the first release candidate of Lucene and Solr 4.8.0. The
> artifacts can be found here:
> >
> > =>
> http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-RC1-rev1589150/
> >
> > It took a bit longer, because we had to fix some remaining bugs
> regarding NativeFSLockFactory, which did not work correctly and leaked file
> handles. I also updated the instructions about the preferred Java update
> versions. See also Mike's blog post:
> http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/
> >
> > Please check the artifacts and give your vote in the next 72 hrs.
> >
> > My +1 will hopefully come a little bit later because Solr tests are
> failing constantly on my release build and smoke tester machine. The
> reason: it seems to be lack of file handles. A standard Ubuntu
> configuration has 1024 file handles and I want a release to pass with that
> common "default" configuration. Instead,
> org.apache.solr.cloud.TestMiniSolrCloudCluster.testBasics fails always with
> crazy error messages (not about too less file handles, more that Jetty
> cannot start up or not bind ports or various other stuff). This did not
> happen on smoking 4.7.x releases.
> >
> > I will run now the smoker again without HDFS (via build.properties) and
> if that also fails then once again with more file handles. But we really
> have to fix our tests that they succeed with the default config of 1024
> file handles. We can configure that in Jenkins (so the Jenkins job first
> sets and then runs ANT "ulimit -n 1024"). But this should not block the
> release, I just say: "I gave up running those Solr tests, sorry! Anybody
> else can test that stuff!"
> >
> > Uwe
> >
> > P.S.: Here's my smoker command line:
> > $  JAVA_HOME=$HOME/jdk1.7.0_55 JAVA7_HOME=$HOME/jdk1.7.0_55 python3.2 -u
> smokeTestRelease.py '
> http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-RC1-rev1589150/'
> 1589150 4.8.0 tmp
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2014-03-24 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Labels:   (was: patch)

> Solr sends all doc ids to all shards in the query counting facets
> -
>
> Key: SOLR-4903
> URL: https://issues.apache.org/jira/browse/SOLR-4903
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.4, 4.3, 4.3.1
>    Reporter: Dmitry Kan
>
> Setup: front end solr and shards.
> Summary: solr frontend sends all doc ids received from QueryComponent to all 
> shards which causes POST request buffer size overflow.
> Symptoms:
> The query is: http://pastebin.com/0DndK1Cs
> I have omitted the shards parameter.
> The router log: http://pastebin.com/FTVH1WF3
> Notice the port of a shard, that is affected. That port changes all the time, 
> even for the same request
> The log entry is prepended with lines:
> SEVERE: org.apache.solr.common.SolrException: Internal Server Error
> Internal Server Error
> (they are not in the pastebin link)
> The shard log: http://pastebin.com/exwCx3LX
> Suggestion: change the data structure in FacetComponent to send only doc ids 
> that belong to a shard and not a concatenation of all doc ids.
> Why is this important: for scaling. Adding more shards will result in 
> overflowing the POST request buffer at some point anyway.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2014-03-24 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Labels: patch  (was: )

> Solr sends all doc ids to all shards in the query counting facets
> -
>
> Key: SOLR-4903
> URL: https://issues.apache.org/jira/browse/SOLR-4903
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.4, 4.3, 4.3.1
>    Reporter: Dmitry Kan
>
> Setup: front end solr and shards.
> Summary: solr frontend sends all doc ids received from QueryComponent to all 
> shards which causes POST request buffer size overflow.
> Symptoms:
> The query is: http://pastebin.com/0DndK1Cs
> I have omitted the shards parameter.
> The router log: http://pastebin.com/FTVH1WF3
> Notice the port of a shard, that is affected. That port changes all the time, 
> even for the same request
> The log entry is prepended with lines:
> SEVERE: org.apache.solr.common.SolrException: Internal Server Error
> Internal Server Error
> (they are not in the pastebin link)
> The shard log: http://pastebin.com/exwCx3LX
> Suggestion: change the data structure in FacetComponent to send only doc ids 
> that belong to a shard and not a concatenation of all doc ids.
> Why is this important: for scaling. Adding more shards will result in 
> overflowing the POST request buffer at some point anyway.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2014-03-21 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943275#comment-13943275
 ] 

Dmitry Kan commented on SOLR-5394:
--

[~mikemccand] can you reproduce the bug with the patch?

> facet.method=fcs seems to be using threads when it shouldn't
> 
>
> Key: SOLR-5394
> URL: https://issues.apache.org/jira/browse/SOLR-5394
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Michael McCandless
> Attachments: SOLR-5394.patch, SOLR-5394.patch, 
> SOLR-5394_keep_threads_original_value.patch
>
>
> I built a wikipedia index, with multiple fields for faceting.
> When I do facet.method=fcs with facet.field=dateFacet and 
> facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
> bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
> But I thought threads for each field is turned off by default?
> Even if I add facet.threads=0, it still spins up all the threads.
> I think something is wrong in SimpleFacets.parseParams; somehow, that method 
> returns early (because localParams) is null, leaving threads=-1, and then the 
> later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2014-03-20 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: SOLR-5394.patch

This patch sets the default threads to 1 (single thread execution) as per 
Vitaly's suggestion. Fixed the test case with unspecified threads parameter: 
the number of threads is expected to be the default (=1). The tests in 
TestSimpleFacet pass.

> facet.method=fcs seems to be using threads when it shouldn't
> 
>
> Key: SOLR-5394
> URL: https://issues.apache.org/jira/browse/SOLR-5394
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Michael McCandless
> Attachments: SOLR-5394.patch, SOLR-5394.patch, 
> SOLR-5394_keep_threads_original_value.patch
>
>
> I built a wikipedia index, with multiple fields for faceting.
> When I do facet.method=fcs with facet.field=dateFacet and 
> facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
> bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
> But I thought threads for each field is turned off by default?
> Even if I add facet.threads=0, it still spins up all the threads.
> I think something is wrong in SimpleFacets.parseParams; somehow, that method 
> returns early (because localParams) is null, leaving threads=-1, and then the 
> later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3758) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.

2014-03-16 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937293#comment-13937293
 ] 

Dmitry Kan commented on LUCENE-3758:


[~erickerickson] right, agree, this should be handled in another jira as a 
local param. We have implemented this as an operator as we allow mixing ordered 
and unordered clauses in the same query.

> Allow the ComplexPhraseQueryParser to search order or un-order proximity 
> queries.
> -
>
> Key: LUCENE-3758
> URL: https://issues.apache.org/jira/browse/LUCENE-3758
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Affects Versions: 4.0-ALPHA
>Reporter: Tomás Fernández Löbbe
>Assignee: Erick Erickson
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-3758.patch, LUCENE-3758.patch, LUCENE-3758.patch
>
>
> The ComplexPhraseQueryParser use SpanNearQuery, but always set the "inOrder" 
> value hardcoded to "true". This could be configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4904) Send internal doc ids and index version in distributed faceting to make queries more compact

2014-03-11 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930792#comment-13930792
 ] 

Dmitry Kan commented on SOLR-4904:
--

[~kamaci] yes, it is still valid. I would imagine that for some extreme commit 
policy cases, like soft-committing every second this might not be a good fit 
(as index changes so fast), but for other cases this sounds like a good idea.

> Send internal doc ids and index version in distributed faceting to make 
> queries more compact
> 
>
> Key: SOLR-4904
> URL: https://issues.apache.org/jira/browse/SOLR-4904
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.4, 4.3
>Reporter: Dmitry Kan
>
> This is suggested by [~ab] at bbuzz conf 2013. This makes a lot of sense and 
> works nice with fixing the root cause of issue SOLR-4903.
> Basically QueryComponent could send internal lucene ids along with the index 
> version number so that in subsequent queries to other solr components, like 
> FacetComponent, the internal ids would be sent. The index version is required 
> to ensure we deal with the same index.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: ant idea on fresh checkout of 4.7

2014-03-11 Thread Dmitry Kan

Hi Steve!
Thanks for sharing the ivy2 org.kitesdk cache piece, compiles fine (phew,
only 5 minutes 16 seconds long build). Looking forward for Cloudera maven
repo resolution.
Dmitry


On 11 March 2014 15:16, Steve Rowe  wrote:

> 0.11.0 is the morphlines version specified for trunk, branch_4x, and the
> lucene_solr_4_7 branch, e.g. from trunk ivy-versions.properties:
>
> org.kitesdk.kite-morphlines.version = 0.11.0
>
> AFAICT, version 0.11.0 of those two artifacts are not available from the
> cloudera repositories - I can see 0.10.0, 0.10.0-* (several), and 0.12.0,
> but no 0.11.0:
>
><
> http://repository.cloudera.com/cloudera/libs-release-local/org/kitesdk/kite-morphlines-hadoop-sequencefile/
> >
><
> http://repository.cloudera.com/cloudera/libs-release-local/org/kitesdk/kite-morphlines-saxon/
> >
>
><
> https://repository.cloudera.com/cloudera/repo/org/kitesdk/kite-morphlines-hadoop-sequencefile/
> >
><
> https://repository.cloudera.com/cloudera/repo/org/kitesdk/kite-morphlines-saxon/
> >
>
> Mark Miller, do you know what's going on?
>
> I made a tarball of all the 0.11.0 files under my
> ~/.ivy2/cache/org.kitesdk/ directory and put them here:
>
> <
> http://people.apache.org/~sarowe/solr-dependencies-org.kitesdk-0.11.0.tar.bz2
> >
>
> Steve
>
> On Mar 11, 2014, at 6:17 AM, Grant Ingersoll  wrote:
>
> > Hi,
> >
> > I did a fresh checkout of 4.7 from SVN and ran ant idea at the top level
> and I get [1].  I presume I am missing the CDH Ivy repo somewhere.  Any one
> have the bits that need to be added handy?
> >
> >
> >
> > [1]
> > :: problems summary ::
> > [ivy:retrieve]  WARNINGS
> > [ivy:retrieve]module not found:
> org.kitesdk#kite-morphlines-saxon;0.11.0
> > [ivy:retrieve] local: tried
> > [ivy:retrieve]
>  
> /pathUsers/grantingersoll/.ivy2/local/org.kitesdk/kite-morphlines-saxon/0.11.0/ivys/ivy.xml
> > [ivy:retrieve]  -- artifact
> org.kitesdk#kite-morphlines-saxon;0.11.0!kite-morphlines-saxon.jar:
> > [ivy:retrieve]
>  
> /path/.ivy2/local/org.kitesdk/kite-morphlines-saxon/0.11.0/jars/kite-morphlines-saxon.jar
> > [ivy:retrieve] shared: tried
> > [ivy:retrieve]
>  /path/.ivy2/shared/org.kitesdk/kite-morphlines-saxon/0.11.0/ivys/ivy.xml
> > [ivy:retrieve]  -- artifact
> org.kitesdk#kite-morphlines-saxon;0.11.0!kite-morphlines-saxon.jar:
> > [ivy:retrieve]
>  
> /path/.ivy2/shared/org.kitesdk/kite-morphlines-saxon/0.11.0/jars/kite-morphlines-saxon.jar
> > [ivy:retrieve] public: tried
> > [ivy:retrieve]
> http://repo1.maven.org/maven2/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.pom
> > [ivy:retrieve]  -- artifact
> org.kitesdk#kite-morphlines-saxon;0.11.0!kite-morphlines-saxon.jar:
> > [ivy:retrieve]
> http://repo1.maven.org/maven2/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.jar
> > [ivy:retrieve] cloudera: tried
> > [ivy:retrieve]
> https://repository.cloudera.com/artifactory/repo/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.pom
> > [ivy:retrieve]  -- artifact
> org.kitesdk#kite-morphlines-saxon;0.11.0!kite-morphlines-saxon.jar:
> > [ivy:retrieve]
> https://repository.cloudera.com/artifactory/repo/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.jar
> > [ivy:retrieve] releases.cloudera.com: tried
> > [ivy:retrieve]
> https://repository.cloudera.com/content/repositories/releases/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.pom
> > [ivy:retrieve]  -- artifact
> org.kitesdk#kite-morphlines-saxon;0.11.0!kite-morphlines-saxon.jar:
> > [ivy:retrieve]
> https://repository.cloudera.com/content/repositories/releases/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.jar
> > [ivy:retrieve] sonatype-releases: tried
> > [ivy:retrieve]
> http://oss.sonatype.org/content/repositories/releases/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.pom
> > [ivy:retrieve]  -- artifact
> org.kitesdk#kite-morphlines-saxon;0.11.0!kite-morphlines-saxon.jar:
> > [ivy:retrieve]
> http://oss.sonatype.org/content/repositories/releases/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.jar
> > [ivy:retrieve] maven.restlet.org: tried
> > [ivy:retrieve]
> http://maven.restlet.org/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.pom
> > [ivy:retrieve]  -- artifact
> org.kitesdk#kite-morphlines-saxon;0.11.0!kite-morphlines-saxon.jar:
> > [ivy:retrieve]
> http://maven.restlet.org/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.jar
> > [ivy:retrieve] svnkit-releases: tried
> > [ivy:retrieve]
> http://maven.tmatesoft.com/content/repositories/releases/org/kitesdk/kite-morphlines-saxon/0.11.0/kite-morphlines-saxon-0.11.0.pom
> > [ivy:retrieve]  -- artifact
> org.kitesdk#kite-morphlines-saxon;0.11.0!kite-morphlines-sax

[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication

2014-03-10 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926081#comment-13926081
 ] 

Dmitry Kan edited comment on LUCENE-5422 at 3/10/14 7:27 PM:
-

I agree with [~mikemccand] in that the issue should be better scoped. The case 
with compressing stemmed / non-stemmed terms posting lists is quite tricky and 
requires more thought.

One clear case for this issue is storing reversed term along with its original 
non-reversed version. Both should point to the same posting list (subject to 
some after-stemming-hash-check).

What do you guys think?


was (Author: dmitry_key):
I agree with [~mikemccand] in that the issue should be better scoped. The case 
with compressing stemmed / non-stemmed terms posting lists is quite tricky and 
requires more thought.

One clear case for this issue is storing reversed term along with it is 
original non-reversed version. Both should point to the same posting list 
(subject to some after-stemming-hash-check).

What do you guys think?

> Postings lists deduplication
> 
>
> Key: LUCENE-5422
> URL: https://issues.apache.org/jira/browse/LUCENE-5422
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>    Reporter: Dmitry Kan
>  Labels: gsoc2014
>
> The context:
> http://markmail.org/thread/tywtrjjcfdbzww6f
> Robert Muir and I have discussed what Robert eventually named "postings
> lists deduplication" at Berlin Buzzwords 2013 conference.
> The idea is to allow multiple terms to point to the same postings list to
> save space. This can be achieved by new index codec implementation, but this 
> jira is open to other ideas as well.
> The application / impact of this is positive for synonyms, exact / inexact
> terms, leading wildcard support via storing reversed term etc.
> For example, at the moment, when supporting exact (unstemmed) and inexact 
> (stemmed)
> searches, we store both unstemmed and stemmed variant of a word form and
> that leads to index bloating. That is why we had to remove the leading
> wildcard support via reversing a token on index and query time because of
> the same index size considerations.
> Comment from Mike McCandless:
> Neat idea!
> Would this idea allow a single term to point to (the union of) N other
> posting lists?  It seems like that's necessary e.g. to handle the
> exact/inexact case.
> And then, to produce the Docs/AndPositionsEnum you'd need to do the
> merge sort across those N posting lists?
> Such a thing might also be do-able as runtime only wrapper around the
> postings API (FieldsProducer), if you could at runtime do the reverse
> expansion (e.g. stem -> all of its surface forms).
> Comment from Robert Muir:
> I think the exact/inexact is trickier (detecting it would be the hard
> part), and you are right, another solution might work better.
> but for the reverse wildcard and synonyms situation, it seems we could even
> detect it on write if we created some hash of the previous terms postings.
> if the hash matches for the current term, we know it might be a "duplicate"
> and would have to actually do the costly check they are the same.
> maybe there are better ways to do it, but it might be a fun postingformat
> experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5422) Postings lists deduplication

2014-03-10 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926081#comment-13926081
 ] 

Dmitry Kan commented on LUCENE-5422:


I agree with [~mikemccand] in that the issue should be better scoped. The case 
with compressing stemmed / non-stemmed terms posting lists is quite tricky and 
requires more thought.

One clear case for this issue is storing reversed term along with it is 
original non-reversed version. Both should point to the same posting list 
(subject to some after-stemming-hash-check).

What do you guys think?

> Postings lists deduplication
> 
>
> Key: LUCENE-5422
> URL: https://issues.apache.org/jira/browse/LUCENE-5422
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>    Reporter: Dmitry Kan
>  Labels: gsoc2014
>
> The context:
> http://markmail.org/thread/tywtrjjcfdbzww6f
> Robert Muir and I have discussed what Robert eventually named "postings
> lists deduplication" at Berlin Buzzwords 2013 conference.
> The idea is to allow multiple terms to point to the same postings list to
> save space. This can be achieved by new index codec implementation, but this 
> jira is open to other ideas as well.
> The application / impact of this is positive for synonyms, exact / inexact
> terms, leading wildcard support via storing reversed term etc.
> For example, at the moment, when supporting exact (unstemmed) and inexact 
> (stemmed)
> searches, we store both unstemmed and stemmed variant of a word form and
> that leads to index bloating. That is why we had to remove the leading
> wildcard support via reversing a token on index and query time because of
> the same index size considerations.
> Comment from Mike McCandless:
> Neat idea!
> Would this idea allow a single term to point to (the union of) N other
> posting lists?  It seems like that's necessary e.g. to handle the
> exact/inexact case.
> And then, to produce the Docs/AndPositionsEnum you'd need to do the
> merge sort across those N posting lists?
> Such a thing might also be do-able as runtime only wrapper around the
> postings API (FieldsProducer), if you could at runtime do the reverse
> expansion (e.g. stem -> all of its surface forms).
> Comment from Robert Muir:
> I think the exact/inexact is trickier (detecting it would be the hard
> part), and you are right, another solution might work better.
> but for the reverse wildcard and synonyms situation, it seems we could even
> detect it on write if we created some hash of the previous terms postings.
> if the hash matches for the current term, we know it might be a "duplicate"
> and would have to actually do the costly check they are the same.
> maybe there are better ways to do it, but it might be a fun postingformat
> experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5422) Postings lists deduplication

2014-03-05 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921155#comment-13921155
 ] 

Dmitry Kan commented on LUCENE-5422:


[~Vishmi Money]

LUCENE-2082 deals with segment merging which is process performed on 
Lucene index every now and then.

This jira deals with the index structure and suggests that compression 
of index can be achieved for certain (described) use cases. While these jiras 
are related, this jira can be considered as standalone project in itself.

perhaps [~otis] could add something?

> Postings lists deduplication
> 
>
> Key: LUCENE-5422
> URL: https://issues.apache.org/jira/browse/LUCENE-5422
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>    Reporter: Dmitry Kan
>  Labels: gsoc2014
>
> The context:
> http://markmail.org/thread/tywtrjjcfdbzww6f
> Robert Muir and I have discussed what Robert eventually named "postings
> lists deduplication" at Berlin Buzzwords 2013 conference.
> The idea is to allow multiple terms to point to the same postings list to
> save space. This can be achieved by new index codec implementation, but this 
> jira is open to other ideas as well.
> The application / impact of this is positive for synonyms, exact / inexact
> terms, leading wildcard support via storing reversed term etc.
> For example, at the moment, when supporting exact (unstemmed) and inexact 
> (stemmed)
> searches, we store both unstemmed and stemmed variant of a word form and
> that leads to index bloating. That is why we had to remove the leading
> wildcard support via reversing a token on index and query time because of
> the same index size considerations.
> Comment from Mike McCandless:
> Neat idea!
> Would this idea allow a single term to point to (the union of) N other
> posting lists?  It seems like that's necessary e.g. to handle the
> exact/inexact case.
> And then, to produce the Docs/AndPositionsEnum you'd need to do the
> merge sort across those N posting lists?
> Such a thing might also be do-able as runtime only wrapper around the
> postings API (FieldsProducer), if you could at runtime do the reverse
> expansion (e.g. stem -> all of its surface forms).
> Comment from Robert Muir:
> I think the exact/inexact is trickier (detecting it would be the hard
> part), and you are right, another solution might work better.
> but for the reverse wildcard and synonyms situation, it seems we could even
> detect it on write if we created some hash of the previous terms postings.
> if the hash matches for the current term, we know it might be a "duplicate"
> and would have to actually do the costly check they are the same.
> maybe there are better ways to do it, but it might be a fun postingformat
> experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication

2014-03-05 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921155#comment-13921155
 ] 

Dmitry Kan edited comment on LUCENE-5422 at 3/5/14 6:23 PM:


[~Vishmi Money]

LUCENE-2082 deals with segment merging which is _process_ performed on Lucene 
index every now and then.

This jira deals with the index _structure_ and suggests that compression of 
index can be achieved for certain (described) use cases. While these jiras are 
related, this jira can be considered as standalone project in itself.

perhaps [~otis] could add something?


was (Author: dmitry_key):
[~Vishmi Money]

LUCENE-2082 deals with segment merging which is process performed on 
Lucene index every now and then.

This jira deals with the index structure and suggests that compression 
of index can be achieved for certain (described) use cases. While these jiras 
are related, this jira can be considered as standalone project in itself.

perhaps [~otis] could add something?

> Postings lists deduplication
> 
>
> Key: LUCENE-5422
> URL: https://issues.apache.org/jira/browse/LUCENE-5422
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>    Reporter: Dmitry Kan
>  Labels: gsoc2014
>
> The context:
> http://markmail.org/thread/tywtrjjcfdbzww6f
> Robert Muir and I have discussed what Robert eventually named "postings
> lists deduplication" at Berlin Buzzwords 2013 conference.
> The idea is to allow multiple terms to point to the same postings list to
> save space. This can be achieved by new index codec implementation, but this 
> jira is open to other ideas as well.
> The application / impact of this is positive for synonyms, exact / inexact
> terms, leading wildcard support via storing reversed term etc.
> For example, at the moment, when supporting exact (unstemmed) and inexact 
> (stemmed)
> searches, we store both unstemmed and stemmed variant of a word form and
> that leads to index bloating. That is why we had to remove the leading
> wildcard support via reversing a token on index and query time because of
> the same index size considerations.
> Comment from Mike McCandless:
> Neat idea!
> Would this idea allow a single term to point to (the union of) N other
> posting lists?  It seems like that's necessary e.g. to handle the
> exact/inexact case.
> And then, to produce the Docs/AndPositionsEnum you'd need to do the
> merge sort across those N posting lists?
> Such a thing might also be do-able as runtime only wrapper around the
> postings API (FieldsProducer), if you could at runtime do the reverse
> expansion (e.g. stem -> all of its surface forms).
> Comment from Robert Muir:
> I think the exact/inexact is trickier (detecting it would be the hard
> part), and you are right, another solution might work better.
> but for the reverse wildcard and synonyms situation, it seems we could even
> detect it on write if we created some hash of the previous terms postings.
> if the hash matches for the current term, we know it might be a "duplicate"
> and would have to actually do the costly check they are the same.
> maybe there are better ways to do it, but it might be a fun postingformat
> experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-13 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan closed SOLR-5697.



works as expected with solr 4.7. See previous comment.

> Delete by query does not work properly with customly configured query parser
> 
>
> Key: SOLR-5697
> URL: https://issues.apache.org/jira/browse/SOLR-5697
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers, update
>Affects Versions: 4.3.1
>    Reporter: Dmitry Kan
> Fix For: 5.0, 4.7
>
> Attachments: query_parser_maven_project.tgz, shard.tgz
>
>
> The shard with the configuration illustrating the issue is attached. Since 
> the size of the archive exceed the upload limit, I have dropped the solr.war 
> from the webapps directory. Please add it (SOLR 4.3.1).
> Also attached is example query parser maven project. The binary has been 
> already deployed onto lib directories of each core.
> Start the shard using startUp_multicore.sh.
> 1. curl 
> 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
> --data-binary 'Title:this_title' -H 
> "Content-type:text/xml"
> This query produces an exception:
> 
> 
> 400 name="QTime">33Unknown query 
> parser 'lucene'400
> 
> 2. Change the multicore/metadata/solrconfig.xml and 
> multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
> requestHandler name="/select".
> Issue the same query. Result is same:
> 
> 
> 400 name="QTime">30Unknown query 
> parser 'lucene'400
> 
> 3. Keep the same config as in 2. and specify query parser in the local params:
> curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
> --data-binary '{!qparser1}Title:this_title' 
> -H "Content-type:text/xml"
> The result:
> 
> 
> 400 name="QTime">3no field name 
> specified in query and no default specified via 'df' param name="code">400
> 
> The reason being because our query parser is "mis-behaving" in that it 
> removes colons from the input queries => we get on the server side:
> Modified input query: Title:this_title ---> Titlethis_title
> 5593 [qtp2121668094-15] INFO  
> org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] 
> webapp=/solr path=/update params={debugQuery=on&commit=false} {} 0 31
> 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
> org.apache.solr.common.SolrException: no field name specified in query and no 
> default specified via 'df' param
>   at 
> org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
>   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
>   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
>   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
>   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
>   at 
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
>   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
>   at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
>   at 
> org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
>   at 
> org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
>   at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
>   at org.apache.solr.handler.loader.XMLLoader.

[jira] [Commented] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-13 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900240#comment-13900240
 ] 

Dmitry Kan commented on SOLR-5697:
--

Hoss: thanks for looking into this. I can confirm all test cases work fine with 
solr 4.7 (solr-4.7-2014-02-12_02-54-24.tgz). I'm guessing very little chance 
this gets backported to solr 4.3.1? BTW, using exact same configs didn't 
produce an NPE for solr 4.7 (it gets thrown as you said for 4.6.1 however).

> Delete by query does not work properly with customly configured query parser
> 
>
> Key: SOLR-5697
> URL: https://issues.apache.org/jira/browse/SOLR-5697
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers, update
>Affects Versions: 4.3.1
>Reporter: Dmitry Kan
> Fix For: 5.0, 4.7
>
> Attachments: query_parser_maven_project.tgz, shard.tgz
>
>
> The shard with the configuration illustrating the issue is attached. Since 
> the size of the archive exceed the upload limit, I have dropped the solr.war 
> from the webapps directory. Please add it (SOLR 4.3.1).
> Also attached is example query parser maven project. The binary has been 
> already deployed onto lib directories of each core.
> Start the shard using startUp_multicore.sh.
> 1. curl 
> 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
> --data-binary 'Title:this_title' -H 
> "Content-type:text/xml"
> This query produces an exception:
> 
> 
> 400 name="QTime">33Unknown query 
> parser 'lucene'400
> 
> 2. Change the multicore/metadata/solrconfig.xml and 
> multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
> requestHandler name="/select".
> Issue the same query. Result is same:
> 
> 
> 400 name="QTime">30Unknown query 
> parser 'lucene'400
> 
> 3. Keep the same config as in 2. and specify query parser in the local params:
> curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
> --data-binary '{!qparser1}Title:this_title' 
> -H "Content-type:text/xml"
> The result:
> 
> 
> 400 name="QTime">3no field name 
> specified in query and no default specified via 'df' param name="code">400
> 
> The reason being because our query parser is "mis-behaving" in that it 
> removes colons from the input queries => we get on the server side:
> Modified input query: Title:this_title ---> Titlethis_title
> 5593 [qtp2121668094-15] INFO  
> org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] 
> webapp=/solr path=/update params={debugQuery=on&commit=false} {} 0 31
> 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
> org.apache.solr.common.SolrException: no field name specified in query and no 
> default specified via 'df' param
>   at 
> org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
>   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
>   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
>   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
>   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
>   at 
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
>   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
>   at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
>   at

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Description: 
The shard with the configuration illustrating the issue is attached. Since the 
size of the archive exceed the upload limit, I have dropped the solr.war from 
the webapps directory. Please add it (SOLR 4.3.1).


Also attached is example query parser maven project. The binary has been 
already deployed onto lib directories of each core.

Start the shard using startUp_multicore.sh.


1. curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
--data-binary 'Title:this_title' -H 
"Content-type:text/xml"

This query produces an exception:



40033Unknown query 
parser 'lucene'400



2. Change the multicore/metadata/solrconfig.xml and 
multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
requestHandler name="/select".

Issue the same query. Result is same:



40030Unknown query 
parser 'lucene'400



3. Keep the same config as in 2. and specify query parser in the local params:

curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
--data-binary '{!qparser1}Title:this_title' -H 
"Content-type:text/xml"


The result:



4003no field name 
specified in query and no default specified via 'df' param400



The reason being because our query parser is "mis-behaving" in that it removes 
colons from the input queries => we get on the server side:

Modified input query: Title:this_title ---> Titlethis_title
5593 [qtp2121668094-15] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] webapp=/solr 
path=/update params={debugQuery=on&commit=false} {} 0 31
5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: no field name specified in query and no 
default specified via 'df' param
at 
org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Description: 
The shard with the configuration illustrating the issue is attached. Since the 
size of the archive exceed the upload limit, I have dropped the solr.war from 
the webapps directory. Please add it (SOLR 4.3.1).


Also attached is example query parser maven project. The binary has been 
already deployed onto lib directories of each core.

Start the shard using startUp_multicore.sh.


1. curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
--data-binary 'Title:this_title' -H 
"Content-type:text/xml"

This query produces an exception:



40033Unknown query 
parser 'lucene'400



2. Change the multicore/metadata/solrconfig.xml and 
multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
requestHandler name="/select".

Issue the same query. Result is same:



40030Unknown query 
parser 'lucene'400



3. Keep the same config as in 2. and specify query parser in the local params:

curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
--data-binary '{!qparser1}Title:this_title' -H 
"Content-type:text/xml"


The result:



4003no field name 
specified in query and no default specified via 'df' param400



The reason being because our query parser is "mis-behaving" in that it removes 
colons from the input queries => we get on the server side:

Modified input query: Title:this_title ---> Titlethis_title
5593 [qtp2121668094-15] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] webapp=/solr 
path=/update params={debugQuery=on&commit=false} {} 0 31
5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: no field name specified in query and no 
default specified via 'df' param
at 
org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Attachment: shard.tgz

shard with config files without solr.war file.

> Delete by query does not work properly with customly configured query parser
> 
>
> Key: SOLR-5697
> URL: https://issues.apache.org/jira/browse/SOLR-5697
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers, update
>Affects Versions: 4.3.1
>    Reporter: Dmitry Kan
> Attachments: query_parser_maven_project.tgz, shard.tgz
>
>
> The shard with the configuration illustrating the issue is attached. Since 
> the size of the archive exceed the upload limit, I have dropped the solr.war 
> from the webapps. Please add it (SOLR 4.3.1).
> Also attached is example query parser maven project. The binary has been 
> already deployed onto lib directories of each core.
> Start the shard using startUp_multicore.sh.
> 1. curl 
> 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
> --data-binary 'Title:this_title' -H 
> "Content-type:text/xml"
> This query produces an exception:
> 
> 
> 400 name="QTime">33Unknown query 
> parser 'lucene'400
> 
> 2. Change the multicore/metadata/solrconfig.xml and 
> multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
> requestHandler name="/select".
> Issue the same query. Result is same:
> 
> 
> 400 name="QTime">30Unknown query 
> parser 'lucene'400
> 
> 3. Keep the same config as in 2. and specify query parser in the local params:
> curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
> --data-binary '{!qparser1}Title:this_title' 
> -H "Content-type:text/xml"
> The result:
> 
> 
> 400 name="QTime">3no field name 
> specified in query and no default specified via 'df' param name="code">400
> 
> The reason being because our query parser is "mis-behaving" in that it 
> removes colons from the input queries => we get on the server side:
> Modified input query: Title:this_title ---> Titlethis_title
> 5593 [qtp2121668094-15] INFO  
> org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] 
> webapp=/solr path=/update params={debugQuery=on&commit=false} {} 0 31
> 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
> org.apache.solr.common.SolrException: no field name specified in query and no 
> default specified via 'df' param
>   at 
> org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
>   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
>   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
>   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
>   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
>   at 
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
>   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
>   at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
>   at 
> org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
>   at 
> org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
>   at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
>   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Description: 
The shard with the configuration illustrating the issue is attached. Since the 
size of the archive exceed the upload limit, I have dropped the solr.war from 
the webapps. Please add it (SOLR 4.3.1).


Also attached is example query parser maven project. The binary has been 
already deployed onto lib directories of each core.

Start the shard using startUp_multicore.sh.


1. curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
--data-binary 'Title:this_title' -H 
"Content-type:text/xml"

This query produces an exception:



40033Unknown query 
parser 'lucene'400



2. Change the multicore/metadata/solrconfig.xml and 
multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
requestHandler name="/select".

Issue the same query. Result is same:



40030Unknown query 
parser 'lucene'400



3. Keep the same config as in 2. and specify query parser in the local params:

curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
--data-binary '{!qparser1}Title:this_title' -H 
"Content-type:text/xml"


The result:



4003no field name 
specified in query and no default specified via 'df' param400



The reason being because our query parser is "mis-behaving" in that it removes 
colons from the input queries => we get on the server side:

Modified input query: Title:this_title ---> Titlethis_title
5593 [qtp2121668094-15] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] webapp=/solr 
path=/update params={debugQuery=on&commit=false} {} 0 31
5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: no field name specified in query and no 
default specified via 'df' param
at 
org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Attachment: query_parser_maven_project.tgz

> Delete by query does not work properly with customly configured query parser
> 
>
> Key: SOLR-5697
> URL: https://issues.apache.org/jira/browse/SOLR-5697
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers, update
>Affects Versions: 4.3.1
>    Reporter: Dmitry Kan
> Attachments: query_parser_maven_project.tgz
>
>
> The shard with the configuration illustrating the issue is attached.
> Also attached is example query parser maven project. The binary has been 
> already deployed onto lib directories of each core.
> Start the shard using startUp_multicore.sh.
> 1. curl 
> 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
> --data-binary 'Title:this_title' -H 
> "Content-type:text/xml"
> This query produces an exception:
> 
> 
> 400 name="QTime">33Unknown query 
> parser 'lucene'400
> 
> 2. Change the multicore/metadata/solrconfig.xml and 
> multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
> requestHandler name="/select".
> Issue the same query. Result is same:
> 
> 
> 400 name="QTime">30Unknown query 
> parser 'lucene'400
> 
> 3. Keep the same config as in 2. and specify query parser in the local params:
> curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
> --data-binary '{!qparser1}Title:this_title' 
> -H "Content-type:text/xml"
> The result:
> 
> 
> 400 name="QTime">3no field name 
> specified in query and no default specified via 'df' param name="code">400
> 
> The reason being because our query parser is "mis-behaving" in that it 
> removes colons from the input queries => we get on the server side:
> Modified input query: Title:this_title ---> Titlethis_title
> 5593 [qtp2121668094-15] INFO  
> org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] 
> webapp=/solr path=/update params={debugQuery=on&commit=false} {} 0 31
> 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
> org.apache.solr.common.SolrException: no field name specified in query and no 
> default specified via 'df' param
>   at 
> org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
>   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
>   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
>   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
>   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
>   at 
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
>   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
>   at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
>   at 
> org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
>   at 
> org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
>   at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
>   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>   at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handle

[jira] [Created] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)

Dmitry Kan created SOLR-5697:


 Summary: Delete by query does not work properly with customly 
configured query parser
 Key: SOLR-5697
 URL: https://issues.apache.org/jira/browse/SOLR-5697
 Project: Solr
  Issue Type: Bug
  Components: query parsers, update
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: query_parser_maven_project.tgz

The shard with the configuration illustrating the issue is attached.
Also attached is example query parser maven project. The binary has been 
already deployed onto lib directories of each core.

Start the shard using startUp_multicore.sh.


1. curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
--data-binary 'Title:this_title' -H 
"Content-type:text/xml"

This query produces an exception:



40033Unknown query 
parser 'lucene'400



2. Change the multicore/metadata/solrconfig.xml and 
multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
requestHandler name="/select".

Issue the same query. Result is same:



40030Unknown query 
parser 'lucene'400



3. Keep the same config as in 2. and specify query parser in the local params:

curl 'http://localhost:8983/solr/metadata/update?commit=false&debugQuery=on' 
--data-binary '{!qparser1}Title:this_title' -H 
"Content-type:text/xml"


The result:



4003no field name 
specified in query and no default specified via 'df' param400



The reason being because our query parser is "mis-behaving" in that it removes 
colons from the input queries => we get on the server side:

Modified input query: Title:this_title ---> Titlethis_title
5593 [qtp2121668094-15] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] webapp=/solr 
path=/update params={debugQuery=on&commit=false} {} 0 31
5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: no field name specified in query and no 
default specified via 'df' param
at 
org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.Scope

[jira] [Updated] (LUCENE-5422) Postings lists deduplication

2014-01-30 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated LUCENE-5422:
---

Labels: gsoc2014  (was: )

> Postings lists deduplication
> 
>
> Key: LUCENE-5422
> URL: https://issues.apache.org/jira/browse/LUCENE-5422
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>        Reporter: Dmitry Kan
>  Labels: gsoc2014
>
> The context:
> http://markmail.org/thread/tywtrjjcfdbzww6f
> Robert Muir and I have discussed what Robert eventually named "postings
> lists deduplication" at Berlin Buzzwords 2013 conference.
> The idea is to allow multiple terms to point to the same postings list to
> save space. This can be achieved by new index codec implementation, but this 
> jira is open to other ideas as well.
> The application / impact of this is positive for synonyms, exact / inexact
> terms, leading wildcard support via storing reversed term etc.
> For example, at the moment, when supporting exact (unstemmed) and inexact 
> (stemmed)
> searches, we store both unstemmed and stemmed variant of a word form and
> that leads to index bloating. That is why we had to remove the leading
> wildcard support via reversing a token on index and query time because of
> the same index size considerations.
> Comment from Mike McCandless:
> Neat idea!
> Would this idea allow a single term to point to (the union of) N other
> posting lists?  It seems like that's necessary e.g. to handle the
> exact/inexact case.
> And then, to produce the Docs/AndPositionsEnum you'd need to do the
> merge sort across those N posting lists?
> Such a thing might also be do-able as runtime only wrapper around the
> postings API (FieldsProducer), if you could at runtime do the reverse
> expansion (e.g. stem -> all of its surface forms).
> Comment from Robert Muir:
> I think the exact/inexact is trickier (detecting it would be the hard
> part), and you are right, another solution might work better.
> but for the reverse wildcard and synonyms situation, it seems we could even
> detect it on write if we created some hash of the previous terms postings.
> if the hash matches for the current term, we know it might be a "duplicate"
> and would have to actually do the costly check they are the same.
> maybe there are better ways to do it, but it might be a fun postingformat
> experiment to try.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5422) Postings lists deduplication

2014-01-30 Thread Dmitry Kan (JIRA)

Dmitry Kan created LUCENE-5422:
--

 Summary: Postings lists deduplication
 Key: LUCENE-5422
 URL: https://issues.apache.org/jira/browse/LUCENE-5422
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Reporter: Dmitry Kan


The context:
http://markmail.org/thread/tywtrjjcfdbzww6f

Robert Muir and I have discussed what Robert eventually named "postings
lists deduplication" at Berlin Buzzwords 2013 conference.

The idea is to allow multiple terms to point to the same postings list to
save space. This can be achieved by new index codec implementation, but this 
jira is open to other ideas as well.

The application / impact of this is positive for synonyms, exact / inexact
terms, leading wildcard support via storing reversed term etc.

For example, at the moment, when supporting exact (unstemmed) and inexact 
(stemmed)
searches, we store both unstemmed and stemmed variant of a word form and
that leads to index bloating. That is why we had to remove the leading
wildcard support via reversing a token on index and query time because of
the same index size considerations.

Comment from Mike McCandless:
Neat idea!

Would this idea allow a single term to point to (the union of) N other
posting lists?  It seems like that's necessary e.g. to handle the
exact/inexact case.

And then, to produce the Docs/AndPositionsEnum you'd need to do the
merge sort across those N posting lists?

Such a thing might also be do-able as runtime only wrapper around the
postings API (FieldsProducer), if you could at runtime do the reverse
expansion (e.g. stem -> all of its surface forms).


Comment from Robert Muir:
I think the exact/inexact is trickier (detecting it would be the hard
part), and you are right, another solution might work better.

but for the reverse wildcard and synonyms situation, it seems we could even
detect it on write if we created some hash of the previous terms postings.
if the hash matches for the current term, we know it might be a "duplicate"
and would have to actually do the costly check they are the same.

maybe there are better ways to do it, but it might be a fun postingformat
experiment to try.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2013-12-12 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: (was: SOLR-5394_keep_threads_original_value.patch)

> facet.method=fcs seems to be using threads when it shouldn't
> 
>
> Key: SOLR-5394
> URL: https://issues.apache.org/jira/browse/SOLR-5394
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Michael McCandless
> Attachments: SOLR-5394_keep_threads_original_value.patch
>
>
> I built a wikipedia index, with multiple fields for faceting.
> When I do facet.method=fcs with facet.field=dateFacet and 
> facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
> bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
> But I thought threads for each field is turned off by default?
> Even if I add facet.threads=0, it still spins up all the threads.
> I think something is wrong in SimpleFacets.parseParams; somehow, that method 
> returns early (because localParams) is null, leaving threads=-1, and then the 
> later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2013-12-12 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: SOLR-5394_keep_threads_original_value.patch

> facet.method=fcs seems to be using threads when it shouldn't
> 
>
> Key: SOLR-5394
> URL: https://issues.apache.org/jira/browse/SOLR-5394
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Michael McCandless
> Attachments: SOLR-5394_keep_threads_original_value.patch
>
>
> I built a wikipedia index, with multiple fields for faceting.
> When I do facet.method=fcs with facet.field=dateFacet and 
> facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
> bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
> But I thought threads for each field is turned off by default?
> Even if I add facet.threads=0, it still spins up all the threads.
> I think something is wrong in SimpleFacets.parseParams; somehow, that method 
> returns early (because localParams) is null, leaving threads=-1, and then the 
> later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2013-12-12 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: SOLR-5394_keep_threads_original_value.patch

During debugging with facet.threads=0 I have noticed that when we advanced to 
parseParams method, threads=0 and this method resets it to -1 which breaks the 
latter logic. So I added a condition around threads=-1.

I would be happy if someone can review this little patch and give feedback.

> facet.method=fcs seems to be using threads when it shouldn't
> 
>
> Key: SOLR-5394
> URL: https://issues.apache.org/jira/browse/SOLR-5394
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Michael McCandless
> Attachments: SOLR-5394_keep_threads_original_value.patch
>
>
> I built a wikipedia index, with multiple fields for faceting.
> When I do facet.method=fcs with facet.field=dateFacet and 
> facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
> bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
> But I thought threads for each field is turned off by default?
> Even if I add facet.threads=0, it still spins up all the threads.
> I think something is wrong in SimpleFacets.parseParams; somehow, that method 
> returns early (because localParams) is null, leaving threads=-1, and then the 
> later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2013-12-11 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845661#comment-13845661
 ] 

Dmitry Kan commented on SOLR-1604:
--

[~rebeccatang] you can define a solr core (even for a single index) and use its 
lib directory to copy the complex phrase parser jar.

https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml

HTH

> Wildcards, ORs etc inside Phrase Queries
> 
>
> Key: SOLR-1604
> URL: https://issues.apache.org/jira/browse/SOLR-1604
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers, search
>Affects Versions: 1.4
>Reporter: Ahmet Arslan
>Priority: Minor
> Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
> ComplexPhrase-4.2.1.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
> ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
> ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, 
> SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch
>
>
> Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
> wildcards, ORs, ranges, fuzzies inside phrase queries.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2013-10-15 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795092#comment-13795092
 ] 

Dmitry Kan commented on SOLR-1726:
--

[~sstults] Thanks for the use case. This leans towards offline as well, but 
certainly makes sense.
Our current use case is realtime though and we attacking the problem of deep 
paging differently at the moment (on the querying client side).

> Deep Paging and Large Results Improvements
> --
>
> Key: SOLR-1726
> URL: https://issues.apache.org/jira/browse/SOLR-1726
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.6
>
> Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
> ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, 
> SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java
>
>
> There are possibly ways to improve collections of "deep paging" by passing 
> Solr/Lucene more information about the last page of results seen, thereby 
> saving priority queue operations.   See LUCENE-2215.
> There may also be better options for retrieving large numbers of rows at a 
> time that are worth exploring.  LUCENE-2127.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5277) Stamp core names on log entries for certain classes

2013-09-26 Thread Dmitry Kan (JIRA)

Dmitry Kan created SOLR-5277:


 Summary: Stamp core names on log entries for certain classes
 Key: SOLR-5277
 URL: https://issues.apache.org/jira/browse/SOLR-5277
 Project: Solr
  Issue Type: Bug
  Components: search, update
Affects Versions: 4.4, 4.3.1, 4.5
Reporter: Dmitry Kan


It is handy that certain Java classes stamp a [coreName] on a log entry. It 
would be useful for multicore setup if more classes would stamp this 
information.

In particular we came accross a situaion with commits coming in a quick 
succession to the same multicore shard and found it to be hard time figuring 
out was it the same core or different cores.

The classes in question with log sample output:

o.a.s.c.SolrCore

06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
SolrDeletionPolicy.onCommit: commits:num=2

11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.core.SolrCore 
- Soft AutoCommit: if uncommited for 1000ms;



o.a.s.u.UpdateHandler

14:45:24.447 [commitScheduler-9-thread-1] INFO  
org.apache.solr.update.UpdateHandler - start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler - 
end_commit_flush



o.a.s.s.SolrIndexSearcher

14:45:24.553 [commitScheduler-7-thread-1] INFO  
org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main


The original question was posted on #solr and on SO:

http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: solr performance testing

2013-09-01 Thread Dmitry Kan

Hello!

Have a look onto solrjmeter tool. It is using JMeter already and provides
command line friendly interface.

https://github.com/romanchyla/solrjmeter

Dmitry Kan
Mikhail,

Your best best would be loading up the system using Jmeter server+client
and firing 100s of queries per second. And to monitor you can enable JMX
and monitor using jconsole or you can also use the analytical/monitoring
tools. LoadUI.org has a free tool which can load the system up.

When you say benchmarking what are the key attributes that you are looking
at?

Thanks & Regards,
Kranti K Parisa
http://www.linkedin.com/in/krantiparisa



On Sat, Aug 31, 2013 at 3:44 PM, Erick Erickson wrote:

> Solrmeter can load the system up much more heavily than a few calls a
> minute as I remember, although I'm not sure how up-to-date it is at this
> point.
>
>  Erick
>
>
> On Sat, Aug 31, 2013 at 3:27 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
>> Hello Kranti,
>>
>> Definitely not Solrmeter, last time I saw it, it provides few calls in a
>> minute rate. Hence it's an analytical/monitoring tool like NewRelic&
>> Sematext. I rather looking for low level benchmarking tool. I see lucene
>> has luceneutil/<http://code.google.com/a/apache-extras.org/p/luceneutil/>and 
>> lucene/benchmark and I wonder which of them is easier to adapt for
>> testing Solr.
>>
>>
>> On Sat, Aug 31, 2013 at 11:21 AM, Kranti Parisa 
>> wrote:
>>
>>> you can try
>>> https://code.google.com/p/solrmeter/
>>>
>>> or you can also run JMeter tests.
>>>
>>> or try free trails given by NewRelic or Sematext, they both have
>>> extensive stats for the Solr instances
>>>
>>> Thanks & Regards,
>>> Kranti K Parisa
>>> http://www.linkedin.com/in/krantiparisa
>>>
>>>
>>>
>>> On Thu, Aug 29, 2013 at 6:32 AM, Mikhail Khludnev <
>>> mkhlud...@griddynamics.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> afaik http://code.google.com/a/apache-extras.org/p/luceneutil/ is used
>>>> for testing Lucene performance. What about Solr? Is it also supported or
>>>> there are separate well known facility?
>>>>
>>>> Thanks in advance
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>>  
>>>>
>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  
>>
>
>

[jira] [Commented] (SOLR-5200) Add REST support for reading and modifying Solr configuration

2013-08-30 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754452#comment-13754452
 ] 

Dmitry Kan commented on SOLR-5200:
--

One parameter relevant to us is mergeFactor.

> Add REST support for reading and modifying Solr configuration
> -
>
> Key: SOLR-5200
> URL: https://issues.apache.org/jira/browse/SOLR-5200
> Project: Solr
>  Issue Type: New Feature
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>
> There should be a REST API to allow full read access to, and write access to 
> some elements of, Solr's per-core and per-node configuration not already 
> covered by the Schema REST API: 
> {{solrconfig.xml}}/{{core.properties}}/{{solrcore.properties}} and 
> {{solr.xml}}/{{solr.properties}} (SOLR-4718 discusses addition of 
> {{solr.properties}}).
> Use cases for runtime configuration modification include scripted setup, 
> troubleshooting, and tuning.
> Tentative rules-of-thumb about configuration items that should not be 
> modifiable at runtime:
> # Startup-only items, e.g. where to start core discovery
> # Items that are deprecated in 4.X and will be removed in 5.0
> # Items that if modified should be followed by a full re-index
> Some issues to consider:
> Persistence: How (and even whether) to handle persistence for configuration 
> modifications via REST API is not clear - e.g. persisting the entire config 
> file or having one or more sidecar config files that get persisted.  The 
> extent of what should be modifiable will likely affect how persistence is 
> implemented.  For example, if the only {{solrconfig.xml}} modifiable items 
> turn out to be plugin configurations, an alternative to 
> full-{{solrconfig.xml}} persistence could be individual plugin registration 
> of runtime config modifiable items, along with per-plugin sidecar config 
> persistence.
> "Live" reload: Most (if not all) per-core configuration modifications will 
> require core reload, though it will be a "live" reload, so some things won't 
> be modifiable, e.g. {{}} and {{IndexWriter}} related settings in 
> {{}} - see SOLR-3592.  (Should a full reload be supported to 
> handle changes in these places?)
> Interpolation aka property substitution: I think it would be useful on read 
> access to optionally return raw values in addition to the interpolated 
> values, e.g. {{solr.xml}} {{hostPort}} raw value {{$\{jetty.port:8983}}} vs. 
> interpolated value {{8983}}.   Modification requests will accept raw values - 
> property interpolation will be applied.  At present interpolation is done 
> once, at parsing time, but if property value modification is supported via 
> the REST API, an alternative could be to delay interpolation until values are 
> requested; in this way, property value modification would not trigger 
> re-parsing the affected configuration source.
> Response format: Similarly to the schema REST API, results could be returned 
> in XML, JSON, or any other response writer's output format.
> Transient cores: How should non-loaded transient cores be handled?  Simplest 
> thing would be to load the transient core before handling the request, just 
> like other requests.
> Below I provide an exhaustive list of configuration items in the files in 
> question and indicate which ones I think could be modifiable at runtime.  I 
> don't mean to imply that these must all be made modifiable, or for those that 
> are made modifiable, that they must be made so at once - a piecemeal approach 
> will very likely be more appropriate.
> h2. {{solrconfig.xml}}
> Note that XIncludes and includes via Document Entities won't survive a 
> modification request (assuming persistence is via overwriting the original 
> file).
> ||XPath under {{/config/}}||Should be modifiable via REST 
> API?||Rationale||Description||
> |{{luceneMatchVersion}}|No|Modifying this should be followed by a full 
> re-index|Controls what version of Lucene various components of Solr adhere to|
> |{{lib}}|Yes|Required for adding plugins at runtime|Contained jars available 
> via classloader for {{solrconfig.xml}} and {{schema.xml}}| 
> |{{dataDir}}|No|Not supported by "live" RELOAD|Holds all index data|
> |{{directoryFactory}}|No|Not supported by "live" RELOAD|index directory 
> factory|
> |{{codecFactory}}|No|Modifying this should be followed by a full 
> re-index|index codec factory, per-field SchemaCodecFactory by default|
> |{{schemaFactory}}|Partial|Although the class shouldn't be modifiable, it 
&g

[solr 4.4.0] SPLITSHARD: small inconvenience

2013-08-02 Thread Dmitry Kan

Hello,

SPLITSHARD is great, but has a small inconvenience in the core-level split
mode.

The following query:

http://localhost:8982/solr/admin/cores?core=statements&action=SPLIT&path=multicore/core11&path=multicore/core12

will create two sub-directories in multicore dir: core11 and core12 with
partitioned index of the core1, but the directory structure does not follow
data/index structure. Which, I think, in turn does not allow to add the two
new cores via the solr dashboard properly.

Regards,

Dmitry Kan

Re: Measuring SOLR performance

2013-08-01 Thread Dmitry Kan

Hi Roman,

When I try to run with -q
/home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries

here what is reported:
Traceback (most recent call last):
  File "solrjmeter.py", line 1390, in 
main(sys.argv)
  File "solrjmeter.py", line 1309, in main
tests = find_tests(options)
  File "solrjmeter.py", line 461, in find_tests
with changed_dir(pattern):
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
  File "solrjmeter.py", line 229, in changed_dir
os.chdir(new)
OSError: [Errno 20] Not a directory:
'/home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries'

Best,

Dmitry



On Wed, Jul 31, 2013 at 7:21 PM, Roman Chyla  wrote:

> Hi Dmitry,
> probably mistake in the readme, try calling it with -q
> /home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries
>
> as for the base_url, i was testing it on solr4.0, where it tries contactin
> /solr/admin/system - is it different for 4.3? I guess I should make it
> configurable (it already is, the endpoint is set at the check_options())
>
> thanks
>
> roman
>
>
> On Wed, Jul 31, 2013 at 10:01 AM, Dmitry Kan  wrote:
>
> > Ok, got the error fixed by modifying the base solr ulr in solrjmeter.py
> > (added core name after /solr part).
> > Next error is:
> >
> > WARNING: no test name(s) supplied nor found in:
> > ['/home/dmitry/projects/lab/solrjmeter/demo/queries/demo.queries']
> >
> > It is a 'slow start with new tool' symptom I guess.. :)
> >
> >
> > On Wed, Jul 31, 2013 at 4:39 PM, Dmitry Kan 
> wrote:
> >
> >> Hi Roman,
> >>
> >> What  version and config of SOLR does the tool expect?
> >>
> >> Tried to run, but got:
> >>
> >> **ERROR**
> >>   File "solrjmeter.py", line 1390, in 
> >> main(sys.argv)
> >>   File "solrjmeter.py", line 1296, in main
> >> check_prerequisities(options)
> >>   File "solrjmeter.py", line 351, in check_prerequisities
> >> error('Cannot contact: %s' % options.query_endpoint)
> >>   File "solrjmeter.py", line 66, in error
> >> traceback.print_stack()
> >> Cannot contact: http://localhost:8983/solr
> >>
> >>
> >> complains about URL, clicking which leads properly to the admin page...
> >> solr 4.3.1, 2 cores shard
> >>
> >> Dmitry
> >>
> >>
> >> On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla  >wrote:
> >>
> >>> Hello,
> >>>
> >>> I have been wanting some tools for measuring performance of SOLR,
> similar
> >>> to Mike McCandles' lucene benchmark.
> >>>
> >>> so yet another monitor was born, is described here:
> >>>
> http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
> >>>
> >>> I tested it on the problem of garbage collectors (see the blogs for
> >>> details) and so far I can't conclude whether highly customized G1 is
> >>> better
> >>> than highly customized CMS, but I think interesting details can be seen
> >>> there.
> >>>
> >>> Hope this helps someone, and of course, feel free to improve the tool
> and
> >>> share!
> >>>
> >>> roman
> >>>
> >>
> >>
> >
>

Re: Measuring SOLR performance

2013-07-31 Thread Dmitry Kan

Ok, got the error fixed by modifying the base solr ulr in solrjmeter.py
(added core name after /solr part).
Next error is:

WARNING: no test name(s) supplied nor found in:
['/home/dmitry/projects/lab/solrjmeter/demo/queries/demo.queries']

It is a 'slow start with new tool' symptom I guess.. :)


On Wed, Jul 31, 2013 at 4:39 PM, Dmitry Kan  wrote:

> Hi Roman,
>
> What  version and config of SOLR does the tool expect?
>
> Tried to run, but got:
>
> **ERROR**
>   File "solrjmeter.py", line 1390, in 
> main(sys.argv)
>   File "solrjmeter.py", line 1296, in main
> check_prerequisities(options)
>   File "solrjmeter.py", line 351, in check_prerequisities
> error('Cannot contact: %s' % options.query_endpoint)
>   File "solrjmeter.py", line 66, in error
> traceback.print_stack()
> Cannot contact: http://localhost:8983/solr
>
>
> complains about URL, clicking which leads properly to the admin page...
> solr 4.3.1, 2 cores shard
>
> Dmitry
>
>
> On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla wrote:
>
>> Hello,
>>
>> I have been wanting some tools for measuring performance of SOLR, similar
>> to Mike McCandles' lucene benchmark.
>>
>> so yet another monitor was born, is described here:
>> http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
>>
>> I tested it on the problem of garbage collectors (see the blogs for
>> details) and so far I can't conclude whether highly customized G1 is
>> better
>> than highly customized CMS, but I think interesting details can be seen
>> there.
>>
>> Hope this helps someone, and of course, feel free to improve the tool and
>> share!
>>
>> roman
>>
>
>

Re: Measuring SOLR performance

2013-07-31 Thread Dmitry Kan

Hi Roman,

What  version and config of SOLR does the tool expect?

Tried to run, but got:

**ERROR**
  File "solrjmeter.py", line 1390, in 
main(sys.argv)
  File "solrjmeter.py", line 1296, in main
check_prerequisities(options)
  File "solrjmeter.py", line 351, in check_prerequisities
error('Cannot contact: %s' % options.query_endpoint)
  File "solrjmeter.py", line 66, in error
traceback.print_stack()
Cannot contact: http://localhost:8983/solr


complains about URL, clicking which leads properly to the admin page...
solr 4.3.1, 2 cores shard

Dmitry


On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla  wrote:

> Hello,
>
> I have been wanting some tools for measuring performance of SOLR, similar
> to Mike McCandles' lucene benchmark.
>
> so yet another monitor was born, is described here:
> http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
>
> I tested it on the problem of garbage collectors (see the blogs for
> details) and so far I can't conclude whether highly customized G1 is better
> than highly customized CMS, but I think interesting details can be seen
> there.
>
> Hope this helps someone, and of course, feel free to improve the tool and
> share!
>
> roman
>

Re: DISCUSS: First official Solr documentation release, plan to vote on wed.

2013-07-24 Thread Dmitry Kan

thanks. There are a few more that I could handle before calling it day.


On 24 July 2013 22:02, Shalin Shekhar Mangar  wrote:

>
> On Thu, Jul 25, 2013 at 12:26 AM, Dmitry Kan wrote:
>
>>
>> I started reading through and commenting on typos, where found. Do you
>> get notifs of those?
>>
>
> I fixed the two you reported recently.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: DISCUSS: First official Solr documentation release, plan to vote on wed.

2013-07-24 Thread Dmitry Kan

Hi Hoss,

I started reading through and commenting on typos, where found. Do you get
notifs of those?


On 24 July 2013 20:45, Chris Hostetter  wrote:

>
> : TL;DR: I plan to call a vote to formaly release this doc in ~40 hours.
> Please
> : help out by reviewing/improving the content -- especially if one of the
>
> It looks like all the new major additions are complete,  I'll hold off on
> cutting hte RC for a couple more hours to give folks some more time to
> review looking for typos/tweaks -- I've updated my PDF snapshot so folks
> can review the final formatting, or you can browse the dynamic site and
> post any comments directly on the applicable pages...
>
>
> https://people.apache.org/~hossman/apache-solr-ref-guide-snapshot.pdf
> https://cwiki.apache.org/confluence/display/solr/
>
>
> -Hoss
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: DISCUSS: First official Solr documentation release, plan to vote on wed.

2013-07-24 Thread Dmitry Kan

OK, thanks, Hoss, I can get by with the comments. A jira is perhaps too
much for fixing a typo ;)


On 24 July 2013 18:32, Chris Hostetter  wrote:

>
> : Could I have write access to the pages, thought to start by fixing couple
> : typos. Or should I leave them as comments?
>
> Please leave comments on specific pages, or open Jira isues for larger
> concerns.
>
> The doc is only directly editable by committers in order to be able to be
> able to officially release it...
>
>
> https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation#Internal-MaintainingDocumentation-WhoCanEditThisDocumentation
>
>
> -Hoss
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: DISCUSS: First official Solr documentation release, plan to vote on wed.

2013-07-24 Thread Dmitry Kan

Erick:

it's this e-mail.


On 24 July 2013 15:00, Erick Erickson  wrote:

> Dmitry:
>
> What's your Confluence login?
>
> Erick
>
> On Wed, Jul 24, 2013 at 7:08 AM, Dmitry Kan 
> wrote:
> > Hi Hoss,
> >
> > Could I have write access to the pages, thought to start by fixing couple
> > typos. Or should I leave them as comments?
> >
> > Thanks,
> > Dmitry Kan
> >
> >
> >
> > On 23 July 2013 00:32, Chris Hostetter  wrote:
> >>
> >>
> >> Now that the 4.4 release vote is official and gradually making it's way
> to
> >> the mirrors, I'd like to draw attention to the efforts that have been
> made
> >> to get the "Solr Reference Guide" into releasable shape, so that we can
> plan
> >> on holding a VOTE to release the "4.4" version ASAP...
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
> >> https://people.apache.org/~hossman/apache-solr-ref-guide-snapshot.pdf
> >>
> >> TL;DR: I plan to call a vote to formaly release this doc in ~40 hours.
> >> Please help out by reviewing/improving the content -- especially if one
> of
> >> the subtasks of SOLR-5036 is asigned to you (because that means you
> already
> >> agreed you would).
> >>
> >> - - -
> >>
> >> For those of you who haven't been following along closely, there were a
> >> lot of hicups related to the Apache Confluence instance that slowed us
> down
> >> on getting hte RefGuide content donated by LucidWorks up and running and
> >> editable.  Some of those issues are still outstanding, notably because
> the
> >> ASF is still in the process of trying ot upgrade to Confluence 5.x, and
> that
> >> process is on hold indefinitely (See SOLR-4618 and it's subtasks for
> >> history) which means we may have to re-visit a lot of little things
> once the
> >> Confluence upgrade is complete.
> >>
> >> In the meantime the Ref Guide is in fact up and live and and editable,
> and
> >> progress has been made on two fronts:
> >>
> >>  * Getting the doc up to date with Solr 4.4 (SOLR-5036)
> >>  * Getting a process in place to release/publish the doc with each
> >>minor solr release.
> >>
> >> There are 4-5 major additions needed to the doc to cover Solr 4.4
> >> adequately, see the children of SOLR-5036 for details.  My hope is that
> the
> >> folks who volunteered to help out in writing those new sections of the
> doc
> >> can work on their edits over the next ~40 hours or so, and then on Wed
> >> morning (my time) I plan to vall a VOTE to formally release this doc
> (i'm
> >> volunteering to be the "Doc RM" this time arround)...
> >>
> >> http://s.apache.org/exs
> >>
> >> If you are one of the people assigned an outstanding doc issue, please
> >> speak up if you don't think you'll have your edits finished by then, or
> if
> >> you would like assistance with structure, wording or formatting...
> >>
> >>  * SOLR-5059 - sarowe - schemaless & schema REST
> >>  * SOLR-5060 - miller - HDFS
> >>  * SOLR-5061 - erick  - new solr.xml format
> >>  * SOLR-5062 - shalin - shard splitting review & new deleteshard cmd
> >>  * SOLR-5063 - grant  - UI screen for adding docs
> >>
> >> Even if you are not one of the people listed above, there is still a lot
> >> you can do to help out...
> >>
> >>  * Reviewing any/all sections of the guide looking for content
> >>improvements that can be made (edit or post comments directly on the
> >>affected pages as you find them)
> >>
> >>  * Reviewing the snapshot PDF looking for *general* formatting problems
> >>that need to be addressed. (reply to this thread or open a Jira)
> >>
> >>  * Reviewing & sanity checking the process docs for how we will be
> >>maintaining & releasing hte guide looking for potential probems
> >>(edit or post comments if you see problems)...
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation
> >>
> >>
> >> If anyone has any general comments / concerns, please reply here.
> >>
> >> -Hoss
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: DISCUSS: First official Solr documentation release, plan to vote on wed.

2013-07-24 Thread Dmitry Kan

Hi Hoss,

Could I have write access to the pages, thought to start by fixing couple
typos. Or should I leave them as comments?

Thanks,
Dmitry Kan


On 23 July 2013 00:32, Chris Hostetter  wrote:

>
> Now that the 4.4 release vote is official and gradually making it's way to
> the mirrors, I'd like to draw attention to the efforts that have been made
> to get the "Solr Reference Guide" into releasable shape, so that we can
> plan on holding a VOTE to release the "4.4" version ASAP...
>
> https://cwiki.apache.org/**confluence/display/solr/**
> Apache+Solr+Reference+Guide<https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide>
> https://people.apache.org/~**hossman/apache-solr-ref-guide-**snapshot.pdf<https://people.apache.org/~hossman/apache-solr-ref-guide-snapshot.pdf>
>
> TL;DR: I plan to call a vote to formaly release this doc in ~40 hours.
> Please help out by reviewing/improving the content -- especially if one of
> the subtasks of SOLR-5036 is asigned to you (because that means you already
> agreed you would).
>
> - - -
>
> For those of you who haven't been following along closely, there were a
> lot of hicups related to the Apache Confluence instance that slowed us down
> on getting hte RefGuide content donated by LucidWorks up and running and
> editable.  Some of those issues are still outstanding, notably because the
> ASF is still in the process of trying ot upgrade to Confluence 5.x, and
> that process is on hold indefinitely (See SOLR-4618 and it's subtasks for
> history) which means we may have to re-visit a lot of little things once
> the Confluence upgrade is complete.
>
> In the meantime the Ref Guide is in fact up and live and and editable, and
> progress has been made on two fronts:
>
>  * Getting the doc up to date with Solr 4.4 (SOLR-5036)
>  * Getting a process in place to release/publish the doc with each
>minor solr release.
>
> There are 4-5 major additions needed to the doc to cover Solr 4.4
> adequately, see the children of SOLR-5036 for details.  My hope is that the
> folks who volunteered to help out in writing those new sections of the doc
> can work on their edits over the next ~40 hours or so, and then on Wed
> morning (my time) I plan to vall a VOTE to formally release this doc (i'm
> volunteering to be the "Doc RM" this time arround)...
>
> http://s.apache.org/exs
>
> If you are one of the people assigned an outstanding doc issue, please
> speak up if you don't think you'll have your edits finished by then, or if
> you would like assistance with structure, wording or formatting...
>
>  * SOLR-5059 - sarowe - schemaless & schema REST
>  * SOLR-5060 - miller - HDFS
>  * SOLR-5061 - erick  - new solr.xml format
>  * SOLR-5062 - shalin - shard splitting review & new deleteshard cmd
>  * SOLR-5063 - grant  - UI screen for adding docs
>
> Even if you are not one of the people listed above, there is still a lot
> you can do to help out...
>
>  * Reviewing any/all sections of the guide looking for content
>improvements that can be made (edit or post comments directly on the
>affected pages as you find them)
>
>  * Reviewing the snapshot PDF looking for *general* formatting problems
>that need to be addressed. (reply to this thread or open a Jira)
>
>  * Reviewing & sanity checking the process docs for how we will be
>maintaining & releasing hte guide looking for potential probems
>(edit or post comments if you see problems)...
>
> https://cwiki.apache.org/**confluence/display/solr/**
> Internal+-+Maintaining+**Documentation<https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation>
> https://cwiki.apache.org/**confluence/display/solr/**
> Internal+-+How+To+Publish+**This+Documentation<https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation>
>
>
> If anyone has any general comments / concerns, please reply here.
>
> -Hoss
>
> --**--**-
> To unsubscribe, e-mail: 
> dev-unsubscribe@lucene.apache.**org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: for those of you using gmail...

2013-07-17 Thread Dmitry Kan

Hi Mike,

The first search gives zero hits, but the second shows an "infinite" amount
of results, dating back to June and earlier.

It could be that, if the bug holds, it gets rolled out in steps to
different zones, as you reside in States, I'm in Finland. Lets see if the
bug hits this coast.

Dmitry


On 17 July 2013 17:26, Michael McCandless  wrote:

> Can you try this search in your gmail:
>
> from:jenk...@thetaphi.de regression "build 6605"
>
> And let me know if you get 1 or 0 results back?
>
> I get 0 results back but I should get 1, I think.
>
> Furthermore, if I search for:
>
> from:jenk...@thetaphi.de regression
>
> I only get results up to Jul 2, even though there are many build
> failures after that.
>
> It's as if on Jul 2 Google made regression an index-time-only
> stopword; failed, replication, handler also became stopwords (but
> apparently at different times).
>
> Frustrating ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

2013-07-15 Thread Dmitry Kan

Hello guys,

Indeed, the GWT port is work in progress and far from done. The driving
factor here was to be able to later integrate luke into the solr admin as
well as have the standalone webapp for non-solr users.
There is (was?) a luke stats handler in the solr ui, that printed some
stats on the index. That could be substituted with the GWT app.

The code isn't yet ready to see the light. So if it makes more sense for
Ajay to work on the existing jira with the Apache Pivot implementation, I
would say go ahead.

In the current port effort (the aforementioned github's fork) the UI is the
original one, developed by Andrzej.  Beside the UI rework there is plenty
things to port / verify (like e.g. Hadoop plugin) against the latest lucene
versions.

See the readme.md: https://github.com/dmitrykey/luke


Whichever way's taken, hopefully we end up having stable releases of luke :)

Dmitry Kan


On 14 July 2013 22:38, Andrzej Bialecki  wrote:

> On 7/14/13 5:04 AM, Ajay Bhat wrote:
>
>> Shawn and Andrzej,
>>
>> Thanks for answering my questions. I've looked over the code done by
>> Dmitry and I'll look into what I can do to help with the UI porting in
>> future.
>>
>> I was actually thinking of doing this JIRA as a project by myself with
>> some assistance from the community after getting a mentor for the ASF
>> ICFOSS program, which I haven't found yet. It would be great if I could
>> get one of you guys as a mentor.
>>
>> As the UI work has been mostly done by others like Dmitry Kan, I don't
>> think I need to work on that majorly for now.
>>
>
> It's far from done - he just started the process.
>
>
>> What other work is there to be done that I can do as a project? Any new
>> features or improvements?
>>
>> Regards,
>> Ajay
>>
>> On Jul 14, 2013 1:54 AM, "Andrzej Bialecki" > <mailto:a...@getopt.org>> wrote:
>>
>> On 7/13/13 8:56 PM, Shawn Heisey wrote:
>>
>> On 7/13/2013 3:15 AM, Ajay Bhat wrote:
>>
>> One more question : What version of Lucene does Luke
>> currently support
>> right now? I saw a comment on the issue page that it doesn't
>> support the
>> Lucene 4.1 and 4.2 trunk.
>>
>>
>> The official Luke project only has versions up through
>> 4.0.0-ALPHA.
>>
>> http://code.google.com/p/luke/
>>
>> There is a forked project that has produced Luke for newer
>> Lucene versions.
>>
>> 
>> https://java.net/projects/__**opengrok/downloads<https://java.net/projects/__opengrok/downloads>
>>
>> 
>> <https://java.net/projects/**opengrok/downloads<https://java.net/projects/opengrok/downloads>
>> >
>>
>> I can't seem to locate any information about how they have
>> licensed the
>> newer versions, and I'm not really sure where the source code is
>> living.
>>
>> Regarding a question you asked earlier, Luke is a standalone
>> program.
>> It does include Lucene classes in the "lukeall" version of the
>> executable jar.
>>
>> Luke may have some uses as a library, but I think that most
>> people run
>> it separately.  There is partial Luke functionality embedded in
>> the Solr
>> admin UI, but I don't know whether that is something cooked up
>> by Solr
>> devs or if it shares actual code with Luke.
>>
>>
>> Ajay,
>>
>> Luke is a standalone GUI application, not a library. It uses a
>> custom version of Thinlet GUI toolkit, which is no longer
>> maintained, and it's LGPL licensed, so Luke can't be contributed to
>> the Lucene project as is.
>>
>> Recently several people expressed interest in porting Luke to some
>> other GUI toolkit that is Apache-friendly. See the discussion here:
>>
>> http://groups.google.com/d/__**msg/luke-discuss/S_Whwg2jwmA/_**
>> _9JgqKIe5aiwJ<http://groups.google.com/d/__msg/luke-discuss/S_Whwg2jwmA/__9JgqKIe5aiwJ>
>>
>> <http://groups.google.com/d/**msg/luke-discuss/S_Whwg2jwmA/**
>> 9JgqKIe5aiwJ<http://groups.google.com/d/msg/luke-discuss/S_Whwg2jwmA/9JgqKIe5aiwJ>
>> >
>>
>> In particular, there's a fork by Dmitry Kan - he plans to integrate
>> other patches and forks, and to port Luke from Thinlet to GWT and
>> sync it with the latest version

[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2013-06-24 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Affects Version/s: 4.3.1

> Solr sends all doc ids to all shards in the query counting facets
> -
>
> Key: SOLR-4903
> URL: https://issues.apache.org/jira/browse/SOLR-4903
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.4, 4.3, 4.3.1
>    Reporter: Dmitry Kan
>
> Setup: front end solr and shards.
> Summary: solr frontend sends all doc ids received from QueryComponent to all 
> shards which causes POST request buffer size overflow.
> Symptoms:
> The query is: http://pastebin.com/0DndK1Cs
> I have omitted the shards parameter.
> The router log: http://pastebin.com/FTVH1WF3
> Notice the port of a shard, that is affected. That port changes all the time, 
> even for the same request
> The log entry is prepended with lines:
> SEVERE: org.apache.solr.common.SolrException: Internal Server Error
> Internal Server Error
> (they are not in the pastebin link)
> The shard log: http://pastebin.com/exwCx3LX
> Suggestion: change the data structure in FacetComponent to send only doc ids 
> that belong to a shard and not a concatenation of all doc ids.
> Why is this important: for scaling. Adding more shards will result in 
> overflowing the POST request buffer at some point anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: PMC Chair change

2013-06-24 Thread Dmitry Kan

Congratulations, Uwe!
All the best in your new role.

Dmitry Kan

On 22 June 2013 11:47, Uwe Schindler  wrote:

> Hi all,
>
> most of you will already know it: Since June 19, 2013, I am the new
> Project Management Committee Chair, replacing Steve Rowe. I am glad to
> manage all the legal stuff for new committers or contributions from
> external entities - and also preparing the board reports. All this, of
> course and as always, with included but not deliberate @UweSays quotations.
>
> Many thanks to all PMC members who have voted for me!
> Many thanks to Steve for the help and hints to all the things I have to
> know in my new role!
>
> Uwe
>
> -
> Uwe Schindler
> uschind...@apache.org
> Apache Lucene PMC Chair / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Estimating Solr memory requirements

2013-06-21 Thread Dmitry Kan

I was about to mention this too, as the ram prediction was way lower the
one in reality. But I very much like the idea of the tool. Possibly,
integrating into the admin UI at some point later would gain more light to
it.
On 21 Jun 2013 03:55, "Erick Erickson"  wrote:

> Gaaah, I've been looking at this more today and it's completely not
> ready for prime-time. Numbers are bogus, so don't rely on it much.
>
> On Thu, Jun 20, 2013 at 10:48 AM, Erick Erickson
>  wrote:
> > 1> MB, you're right. As usual, fresh eyes see stuff I _should_ have
> > seen. If you look at the dump button, it's supposed to show how the
> > figure was arrived at. If it doesn't, let me know. This still means it
> > shouldn't be misleading
> >
> > 2> Right, this is the average number of bytes for the field you're
> > defining per document. This won't particularly change the memory
> > requirements _unless_ you have your document cache turned wy up.
> > The tooltip when you hover over the entry field should be expanded to
> > clarify this.
> >
> > Thanks for the feedback, I'll take care of this this weekend probably.
> >
> > Erick
> >
> > On Thu, Jun 20, 2013 at 12:38 AM, Dmitry Kan 
> wrote:
> >> Hello Erick,
> >>
> >> Tried your tool and have a couple of questions:
> >>
> >> 1. The total section: is the final figure presented in bytes (less
> probable)
> >> or megabytes (more probable)?
> >>
> >> -Total-
> >>
> >> TOTAL (MB):  [3,895 (B)]
> >>
> >> 2. Defined a text field with custom type. Changing "Average text bytes
> ONLY
> >> if stored" didn't change the total. Probably correct? Does the field
> mean
> >> the raw byte size of a document?
> >>
> >> Thanks,
> >>
> >> Dmitry
> >>
> >> On 20 June 2013 09:52, Dmitry Kan  wrote:
> >>>
> >>> No worries.
> >>>
> >>> Otherwise the effort is very useful. This is the first question we
> usually
> >>> get from our superiors: how much RAM would we need to launch a feature?
> >>>
> >>>
> >>> On 20 June 2013 00:36, Erick Erickson  wrote:
> >>>>
> >>>> Nope, never even noticed it until now. That's the right URL though,
> >>>> typo and all
> >>>>
> >>>> Someday I may even fix it ...
> >>>>
> >>>> Thanks,
> >>>> Erick
> >>>>
> >>>> On Wed, Jun 19, 2013 at 3:35 PM, Dmitry Kan 
> >>>> wrote:
> >>>> > Hi Erick,
> >>>> >
> >>>> > Is typo in the title on purpose?
> >>>> >
> >>>> >
> >>>> > On 19 June 2013 15:09, Erick Erickson 
> wrote:
> >>>> >>
> >>>> >> OK, I seem to have stalled on this. Over part of the winter, I put
> >>>> >> together a Swing-based program to help estimate Solr/Lucene memory
> >>>> >> requirements, with all the usual caveats see:
> >>>> >> https://github.com/ErickErickson/SolrMemoryEsitmator.
> >>>> >>
> >>>> >> I have notes to myself that it's still deficient in several areas:
> >>>> >> FieldValueCache estimates
> >>>> >> tlog requirements
> >>>> >> Memory required to re-open a searcher
> >>>> >> Position and term vector memory requirements
> >>>> >> And whatever I haven't thought about yet.
> >>>> >>
> >>>> >> Of course it builds on Grant's spreadsheet (reads "steals from it
> >>>> >> shamelessly!") I'm hoping to have a friendlier interface. And _of
> >>>> >> course_ I'd be willing to donate it to Solr as a
> util/contrib/whatever
> >>>> >> if it fits.
> >>>> >>
> >>>> >> So, what I'm about here is a few things:
> >>>> >>
> >>>> >> > Anyone who wants to try it feel free. The build instructions are
> at
> >>>> >> > the
> >>>> >> > above, but the short form is to clone it, "ant jar" and "java
> -jar
> >>>> >> > dist/estimator.jar". Enter some field info and hit the "Add/Save"
> >>&

Re: Estimating Solr memory requirements

2013-06-20 Thread Dmitry Kan

Hello Erick,

Tried your tool and have a couple of questions:

1. The total section: is the final figure presented in bytes (less
probable) or megabytes (more probable)?

-Total-

TOTAL (MB):  [3,895 (B)]

2. Defined a text field with custom type. Changing "Average text bytes ONLY
if stored" didn't change the total. Probably correct? Does the field mean
the raw byte size of a document?

Thanks,

Dmitry

On 20 June 2013 09:52, Dmitry Kan  wrote:

> No worries.
>
> Otherwise the effort is very useful. This is the first question we usually
> get from our superiors: how much RAM would we need to launch a feature?
>
>
> On 20 June 2013 00:36, Erick Erickson  wrote:
>
>> Nope, never even noticed it until now. That's the right URL though,
>> typo and all
>>
>> Someday I may even fix it ...
>>
>> Thanks,
>> Erick
>>
>> On Wed, Jun 19, 2013 at 3:35 PM, Dmitry Kan 
>> wrote:
>> > Hi Erick,
>> >
>> > Is typo in the title on purpose?
>> >
>> >
>> > On 19 June 2013 15:09, Erick Erickson  wrote:
>> >>
>> >> OK, I seem to have stalled on this. Over part of the winter, I put
>> >> together a Swing-based program to help estimate Solr/Lucene memory
>> >> requirements, with all the usual caveats see:
>> >> https://github.com/ErickErickson/SolrMemoryEsitmator.
>> >>
>> >> I have notes to myself that it's still deficient in several areas:
>> >> FieldValueCache estimates
>> >> tlog requirements
>> >> Memory required to re-open a searcher
>> >> Position and term vector memory requirements
>> >> And whatever I haven't thought about yet.
>> >>
>> >> Of course it builds on Grant's spreadsheet (reads "steals from it
>> >> shamelessly!") I'm hoping to have a friendlier interface. And _of
>> >> course_ I'd be willing to donate it to Solr as a util/contrib/whatever
>> >> if it fits.
>> >>
>> >> So, what I'm about here is a few things:
>> >>
>> >> > Anyone who wants to try it feel free. The build instructions are at
>> the
>> >> > above, but the short form is to clone it, "ant jar" and "java -jar
>> >> > dist/estimator.jar". Enter some field info and hit the "Add/Save"
>> button
>> >> > then hit the "Dump calcs" button to see what it does currently.
>> >>
>> >> It also saves the estimates away in a file and shows all the steps it
>> >> goes through to perform the calculations. It'll also make rudimentary
>> >> field definitions from the entered data. You can come back to it later
>> >> and add to what you've already done.
>> >>
>> >> > Make any improvements you see fit, particular to flesh out the
>> >> > deficiencies listed above.
>> >>
>> >> > Anyone who has, you know, graphic design/Swing skills please feel
>> free
>> >> > to make it better. I'm a newbie as far as using Swing is concerned,
>> and the
>> >> > way I align buttons and checkboxes is pretty hacky. But it works
>> >>
>> >> > Any suggestions anyone wants to make. Suggestions in code are nicest
>> of
>> >> > course, but algorithms for calculating, say, position and tv memory
>> usage
>> >> > would be great as well! Isolated code snippets that I could
>> incorporate
>> >> > would be great too.
>> >>
>> >> > Any info where I've gotten the calculations wrong or don't show
>> enough
>> >> > info to actually figure out whether they're correct or not.
>> >>
>> >> Note that the goal for this is to give a rough idea of memory
>> >> requirements and be easy to use. The spreadsheet is a bit daunting to
>> >> someone who knows nothing about Solr so this might be an easier way to
>> >> get into it.
>> >>
>> >> Thanks,
>> >> Erick
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

Re: Estimating Solr memory requirements

2013-06-19 Thread Dmitry Kan

No worries.

Otherwise the effort is very useful. This is the first question we usually
get from our superiors: how much RAM would we need to launch a feature?


On 20 June 2013 00:36, Erick Erickson  wrote:

> Nope, never even noticed it until now. That's the right URL though,
> typo and all
>
> Someday I may even fix it ...
>
> Thanks,
> Erick
>
> On Wed, Jun 19, 2013 at 3:35 PM, Dmitry Kan 
> wrote:
> > Hi Erick,
> >
> > Is typo in the title on purpose?
> >
> >
> > On 19 June 2013 15:09, Erick Erickson  wrote:
> >>
> >> OK, I seem to have stalled on this. Over part of the winter, I put
> >> together a Swing-based program to help estimate Solr/Lucene memory
> >> requirements, with all the usual caveats see:
> >> https://github.com/ErickErickson/SolrMemoryEsitmator.
> >>
> >> I have notes to myself that it's still deficient in several areas:
> >> FieldValueCache estimates
> >> tlog requirements
> >> Memory required to re-open a searcher
> >> Position and term vector memory requirements
> >> And whatever I haven't thought about yet.
> >>
> >> Of course it builds on Grant's spreadsheet (reads "steals from it
> >> shamelessly!") I'm hoping to have a friendlier interface. And _of
> >> course_ I'd be willing to donate it to Solr as a util/contrib/whatever
> >> if it fits.
> >>
> >> So, what I'm about here is a few things:
> >>
> >> > Anyone who wants to try it feel free. The build instructions are at
> the
> >> > above, but the short form is to clone it, "ant jar" and "java -jar
> >> > dist/estimator.jar". Enter some field info and hit the "Add/Save"
> button
> >> > then hit the "Dump calcs" button to see what it does currently.
> >>
> >> It also saves the estimates away in a file and shows all the steps it
> >> goes through to perform the calculations. It'll also make rudimentary
> >> field definitions from the entered data. You can come back to it later
> >> and add to what you've already done.
> >>
> >> > Make any improvements you see fit, particular to flesh out the
> >> > deficiencies listed above.
> >>
> >> > Anyone who has, you know, graphic design/Swing skills please feel free
> >> > to make it better. I'm a newbie as far as using Swing is concerned,
> and the
> >> > way I align buttons and checkboxes is pretty hacky. But it works
> >>
> >> > Any suggestions anyone wants to make. Suggestions in code are nicest
> of
> >> > course, but algorithms for calculating, say, position and tv memory
> usage
> >> > would be great as well! Isolated code snippets that I could
> incorporate
> >> > would be great too.
> >>
> >> > Any info where I've gotten the calculations wrong or don't show enough
> >> > info to actually figure out whether they're correct or not.
> >>
> >> Note that the goal for this is to give a rough idea of memory
> >> requirements and be easy to use. The spreadsheet is a bit daunting to
> >> someone who knows nothing about Solr so this might be an easier way to
> >> get into it.
> >>
> >> Thanks,
> >> Erick
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Estimating Solr memory requirements

2013-06-19 Thread Dmitry Kan

Hi Erick,

Is typo in the title on purpose?


On 19 June 2013 15:09, Erick Erickson  wrote:

> OK, I seem to have stalled on this. Over part of the winter, I put
> together a Swing-based program to help estimate Solr/Lucene memory
> requirements, with all the usual caveats see:
> https://github.com/ErickErickson/SolrMemoryEsitmator.
>
> I have notes to myself that it's still deficient in several areas:
> FieldValueCache estimates
> tlog requirements
> Memory required to re-open a searcher
> Position and term vector memory requirements
> And whatever I haven't thought about yet.
>
> Of course it builds on Grant's spreadsheet (reads "steals from it
> shamelessly!") I'm hoping to have a friendlier interface. And _of
> course_ I'd be willing to donate it to Solr as a util/contrib/whatever
> if it fits.
>
> So, what I'm about here is a few things:
>
> > Anyone who wants to try it feel free. The build instructions are at the
> above, but the short form is to clone it, "ant jar" and "java -jar
> dist/estimator.jar". Enter some field info and hit the "Add/Save" button
> then hit the "Dump calcs" button to see what it does currently.
>
> It also saves the estimates away in a file and shows all the steps it
> goes through to perform the calculations. It'll also make rudimentary
> field definitions from the entered data. You can come back to it later
> and add to what you've already done.
>
> > Make any improvements you see fit, particular to flesh out the
> deficiencies listed above.
>
> > Anyone who has, you know, graphic design/Swing skills please feel free
> to make it better. I'm a newbie as far as using Swing is concerned, and the
> way I align buttons and checkboxes is pretty hacky. But it works
>
> > Any suggestions anyone wants to make. Suggestions in code are nicest of
> course, but algorithms for calculating, say, position and tv memory usage
> would be great as well! Isolated code snippets that I could incorporate
> would be great too.
>
> > Any info where I've gotten the calculations wrong or don't show enough
> info to actually figure out whether they're correct or not.
>
> Note that the goal for this is to give a rough idea of memory
> requirements and be easy to use. The spreadsheet is a bit daunting to
> someone who knows nothing about Solr so this might be an easier way to
> get into it.
>
> Thanks,
> Erick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [lucene 4.3.1] solr webapp is put to null directory on maven build

2013-06-19 Thread Dmitry Kan

After adding:

target

the war file is put into the target subdir.


On a side note:

running solr with maven jetty plugin seem to work, which required two
artifacts (couldn't figure out where does jetty store the lib dir in this
mode):

command. mvn jetty:run-war

(configured in the jetty-maven-plugin):

  

  ch.qos.logback
  logback-classic
  1.0.13


  tomcat
  commons-logging
  4.0.6

  


when starting the webapp, however, solr tries to create a collection1:

17:02:53.108 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.core.CoreContainer - Creating SolrCore 'collection1' using
instanceDir: ${top-level}/solr/example/solr/collection1

Apparently, ${top-level} var isn't defined either.




On 19 June 2013 16:25, Dmitry Kan  wrote:

> also: ${build-directory} is not set anywhere in the project.
>
>
> On 19 June 2013 16:23, Dmitry Kan  wrote:
>
>> Hello,
>>
>> executing 'package' on Apache Solr Search Server pom
>> (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.
>>
>> Apache Maven 3.0.4
>> OS: Ubuntu 12.04 LTS
>>
>> Thanks,
>>
>> Dmitry Kan
>>
>
>

Re: [lucene 4.3.1] solr webapp is put to null directory on maven build

2013-06-19 Thread Dmitry Kan

also: ${build-directory} is not set anywhere in the project.


On 19 June 2013 16:23, Dmitry Kan  wrote:

> Hello,
>
> executing 'package' on Apache Solr Search Server pom
> (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.
>
> Apache Maven 3.0.4
> OS: Ubuntu 12.04 LTS
>
> Thanks,
>
> Dmitry Kan
>

[lucene 4.3.1] solr webapp is put to null directory on maven build

2013-06-19 Thread Dmitry Kan

Hello,

executing 'package' on Apache Solr Search Server pom
(maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.

Apache Maven 3.0.4
OS: Ubuntu 12.04 LTS

Thanks,

Dmitry Kan

[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2013-06-18 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686964#comment-13686964
 ] 

Dmitry Kan commented on SOLR-1726:
--

"Scrolling is not intended for real time user requests, it is intended for 
cases like scrolling over large portions of data that exists within 
elasticsearch to reindex it for example."

are there any other applications for this except re-indexing?

Also, is it known, how internally the scrolling is implemented, i.e. is it 
efficient in transferring to the client of only what is needed?

> Deep Paging and Large Results Improvements
> --
>
> Key: SOLR-1726
> URL: https://issues.apache.org/jira/browse/SOLR-1726
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.4
>
> Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
> ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, 
> SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java
>
>
> There are possibly ways to improve collections of "deep paging" by passing 
> Solr/Lucene more information about the last page of results seen, thereby 
> saving priority queue operations.   See LUCENE-2215.
> There may also be better options for retrieving large numbers of rows at a 
> time that are worth exploring.  LUCENE-2127.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2082) Performance improvement for merging posting lists

2013-06-14 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683303#comment-13683303
 ] 

Dmitry Kan commented on LUCENE-2082:


hi [~whzz],

Would you be potentially interested in other postings lists idea that came up 
recently?

http://markmail.org/message/6ro7bbez3v3y5mfx#query:+page:1+mid:tywtrjjcfdbzww6f+state:results

It can be of quite high impact on the index size and hopefully relatively easy 
to start an experiment using the lucene codec technology.

Just in case you would get interested.

> Performance improvement for merging posting lists
> -
>
> Key: LUCENE-2082
> URL: https://issues.apache.org/jira/browse/LUCENE-2082
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael Busch
>Priority: Minor
>  Labels: gsoc2013
> Fix For: 4.4
>
>
> A while ago I had an idea about how to improve the merge performance
> for posting lists. This is currently by far the most expensive part of
> segment merging due to all the VInt de-/encoding. Not sure if an idea
> for improving this was already mentioned in the past?
> So the basic idea is it to perform a raw copy of as much posting data
> as possible. The reason why this is difficult is that we have to
> remove deleted documents. But often the fraction of deleted docs in a
> segment is rather low (<10%?), so it's likely that there are quite
> long consecutive sections without any deletions.
> To find these sections we could use the skip lists. Basically at any
> point during the merge we would find the skip entry before the next
> deleted doc. All entries to this point can be copied without
> de-/encoding of the VInts. Then for the section that has deleted docs
> we perform the "normal" way of merging to remove the deletes. Then we
> check again with the skip lists if we can raw copy the next section.
> To make this work there are a few different necessary changes:
> 1) Currently the multilevel skiplist reader/writer can only deal with 
> fixed-size
> skips (16 on the lowest level). It would be an easy change to allow
> variable-size skips, but then the MultiLevelSkipListReader can't
> return numSkippedDocs anymore, which SegmentTermDocs needs -> change 2)
> 2) Store the last docID in which a term occurred in the term
> dictionary. This would also be beneficial for other use cases. By
> doing that the SegmentTermDocs#next(), #read() and #skipTo() know when
> the end of the postinglist is reached. Currently they have to track
> the df, which is why after a skip it's important to take the
> numSkippedDocs into account.
> 3) Change the merging algorithm according to my description above. It's
> important to create a new skiplist entry at the beginning of every
> block that is copied in raw mode, because its next skip entry's values
> are deltas from the beginning of the block. Also the very first posting, and
> that one only, needs to be decoded/encoded to make sure that the
> payload length is explicitly written (i.e. must not depend on the
> previous length). Also such a skip entry has to be created at the
> beginning of each source segment's posting list. With change 2) we don't
> have to worry about the positions of the skip entries. And having a few
> extra skip entries in merged segments won't hurt much.
> If a segment has no deletions at all this will avoid any
> decoding/encoding of VInts (best case). I think it will also work
> great for segments with a rather low amount of deletions. We should
> probably then have a threshold: if the number of deletes exceeds this
> threshold we should fall back to old style merging.
> I haven't implemented any of this, so there might be complications I
> haven't thought about. Please let me know if you can think of reasons
> why this wouldn't work or if you think more changes are necessary.
> I will probably not have time to work on this soon, but I wanted to
> open this issue to not forget about it :). Anyone should feel free to
> take this!
> Btw: I think the flex-indexing branch would be a great place to try this
> out as a new codec. This would also be good to figure out what APIs
> are needed to make merging fully flexible as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Doubt In Apache Solr 4.3.0

2013-06-14 Thread Dmitry Kan

Hi,
You should post your questions of solr / lucene usage sort to the user
list, not here. It is developers list.

http://lucene.apache.org/solr/discussion.html#solr-user-list-solr-userlucene


2013/6/14 vignesh 

>  Hi Team,
>
>  
>
> I am Vignesh, now working in Apache Solr 4.3.0  have
> indexed data and am able to search using the query .
>
> How to carry out Fuzzy Search in Solr 4.3.0 can you guide me through this
> process.
>
> ** **
>
>
>
> *Thanks & Regards.*
>
> *Vignesh.V*
>
> * *
>
> *[image: cid:image001.jpg@01CA4872.39B33D40]*
>
> Ninestars Information Technologies Limited.,
>
> 72, Greams Road, Thousand Lights, Chennai - 600 006. India.
>
> Landline : +91 44 2829 4226 / 36 / 56   X: 144
>
> www.ninestars.in 
>
> ** **
>
> --
> STOP Virus, STOP SPAM, SAVE Bandwidth!
> www.safentrix.com 
> --
>
>
<>

Re: postings lists deduplication

2013-06-07 Thread Dmitry Kan

Thanks for your input Walter, it is valuable.


I have noticed an abruptly cut message of mine earlier in the chain:

For the reverse expansion idea, which I personally like as well, we could
open up an opportunity for folks, who experiment with surface forms
generation based on POS tags and other grammatical features from lemmas at
query time.

Where should we take this? I was thinking of setting up a codec experiment,
if that is the good starting point.

Dmitry


2013/6/6 Walter Underwood 

> I've seen this behavior in commercial tokenizers and stemmers that I've
> used in other products. I would not be surprised if the Basistech package
> for Lucene did this.
>
> wunder
>
> On Jun 6, 2013, at 8:44 AM, Dmitry Kan wrote:
>
> Walter,
>
> How are cases like (won't -> will not) are handled now? Does not it depend
> on tokenizer before stemmer kicks in? I.e. in the example, if ' gets
> removed by tokenizer we end up having won and t as separate tokens? Is
> there any lucene filter able to do the expansion?
>
> Dmitry
>
>
> 2013/6/6 Walter Underwood 
>
>> Stemming is not 1:1. There are contractions that go to two words (won't
>> -> will not), German decompounding can create a nearly arbitrary number of
>> subwords, and there are two-token sequences that stem to a single word.
>>
>> Synonyms also are often multi-word. I just added a symmetrical synonym
>> for "A&M" and "A & M" to our college name search.
>>
>> wunder
>>
>> On Jun 6, 2013, at 4:01 AM, Dmitry Kan wrote:
>>
>> Thanks!
>>
>> Yes, it could be that allowing single term to point to several posting
>> lists is good e.g. for synonyms. So that there would be a single entry
>> point for one synonym (term) of the synonym set and it would find all doc
>> ids where synonyms of the entry point occur. Or is it being done like this
>> already?
>>
>> For the exact / inexact matching, the implementation we have now would
>> suggest all surface forms occurred in the doc corpus of a word and its stem
>> to be pointing to a single posting list. Which potentially makes the
>> inverted index more compact. But maybe maintaining N lists + mergesort is
>> faster?
>>
>> For the reverse expansion idea, which I personally like as well, we could
>>
>>
>> 2013/6/6 Michael McCandless 
>>
>>> Neat idea!
>>>
>>> Would this idea allow a single term to point to (the union of) N other
>>> posting lists?  It seems like that's necessary e.g. to handle the
>>> exact/inexact case.
>>>
>>> And then, to produce the Docs/AndPositionsEnum you'd need to do the
>>> merge sort across those N posting lists?
>>>
>>> Such a thing might also be do-able as runtime only wrapper around the
>>> postings API (FieldsProducer), if you could at runtime do the reverse
>>> expansion (e.g. stem -> all of its surface forms).
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Thu, Jun 6, 2013 at 3:51 AM, Dmitry Kan 
>>> wrote:
>>>
>>> > Robert Muir and I have discussed what Robert eventually named "postings
>>> > lists deduplication" at bbuzz 2013 conference in Berlin.
>>> >
>>> > The idea is to allow multiple terms to point to the same postings list
>>> to
>>> > save space.
>>> >
>>> > The application / impact of this is positive for synonyms, exact /
>>> inexact
>>> > terms, leading wildcard support via storing reversed term etc.
>>> >
>>> > At the moment, when supporting exact (unstemmed) and inexact (stemmed)
>>> > searches, we store both unstemmed and stemmed variant of a word form
>>> and
>>> > that leads to index bloating. For example, we had to remove the leading
>>> > wildcard support via reversing a token on index and query time because
>>> of
>>> > the same index size considerations.
>>> >
>>> > Would you like a jira for this?
>>> >
>>> > Thanks,
>>> >
>>> > Dmitry Kan
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>>  --
>> Walter Underwood
>> wun...@wunderwood.org
>>
>>
>>
>>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Re: postings lists deduplication

2013-06-06 Thread Dmitry Kan

Walter,

How are cases like (won't -> will not) are handled now? Does not it depend
on tokenizer before stemmer kicks in? I.e. in the example, if ' gets
removed by tokenizer we end up having won and t as separate tokens? Is
there any lucene filter able to do the expansion?

Dmitry


2013/6/6 Walter Underwood 

> Stemming is not 1:1. There are contractions that go to two words (won't ->
> will not), German decompounding can create a nearly arbitrary number of
> subwords, and there are two-token sequences that stem to a single word.
>
> Synonyms also are often multi-word. I just added a symmetrical synonym for
> "A&M" and "A & M" to our college name search.
>
> wunder
>
> On Jun 6, 2013, at 4:01 AM, Dmitry Kan wrote:
>
> Thanks!
>
> Yes, it could be that allowing single term to point to several posting
> lists is good e.g. for synonyms. So that there would be a single entry
> point for one synonym (term) of the synonym set and it would find all doc
> ids where synonyms of the entry point occur. Or is it being done like this
> already?
>
> For the exact / inexact matching, the implementation we have now would
> suggest all surface forms occurred in the doc corpus of a word and its stem
> to be pointing to a single posting list. Which potentially makes the
> inverted index more compact. But maybe maintaining N lists + mergesort is
> faster?
>
> For the reverse expansion idea, which I personally like as well, we could
>
>
> 2013/6/6 Michael McCandless 
>
>> Neat idea!
>>
>> Would this idea allow a single term to point to (the union of) N other
>> posting lists?  It seems like that's necessary e.g. to handle the
>> exact/inexact case.
>>
>> And then, to produce the Docs/AndPositionsEnum you'd need to do the
>> merge sort across those N posting lists?
>>
>> Such a thing might also be do-able as runtime only wrapper around the
>> postings API (FieldsProducer), if you could at runtime do the reverse
>> expansion (e.g. stem -> all of its surface forms).
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Jun 6, 2013 at 3:51 AM, Dmitry Kan 
>> wrote:
>>
>> > Robert Muir and I have discussed what Robert eventually named "postings
>> > lists deduplication" at bbuzz 2013 conference in Berlin.
>> >
>> > The idea is to allow multiple terms to point to the same postings list
>> to
>> > save space.
>> >
>> > The application / impact of this is positive for synonyms, exact /
>> inexact
>> > terms, leading wildcard support via storing reversed term etc.
>> >
>> > At the moment, when supporting exact (unstemmed) and inexact (stemmed)
>> > searches, we store both unstemmed and stemmed variant of a word form and
>> > that leads to index bloating. For example, we had to remove the leading
>> > wildcard support via reversing a token on index and query time because
>> of
>> > the same index size considerations.
>> >
>> > Would you like a jira for this?
>> >
>> > Thanks,
>> >
>> > Dmitry Kan
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Re: postings lists deduplication

2013-06-06 Thread Dmitry Kan

Mike, Robert,

Is *Pluggable Codec* good way for setting up this postingformat experiment?

Dmitry


2013/6/6 Dmitry Kan 

> Thanks!
>
> Yes, it could be that allowing single term to point to several posting
> lists is good e.g. for synonyms. So that there would be a single entry
> point for one synonym (term) of the synonym set and it would find all doc
> ids where synonyms of the entry point occur. Or is it being done like this
> already?
>
> For the exact / inexact matching, the implementation we have now would
> suggest all surface forms occurred in the doc corpus of a word and its stem
> to be pointing to a single posting list. Which potentially makes the
> inverted index more compact. But maybe maintaining N lists + mergesort is
> faster?
>
> For the reverse expansion idea, which I personally like as well, we could
>
>
> 2013/6/6 Michael McCandless 
>
>> Neat idea!
>>
>> Would this idea allow a single term to point to (the union of) N other
>> posting lists?  It seems like that's necessary e.g. to handle the
>> exact/inexact case.
>>
>> And then, to produce the Docs/AndPositionsEnum you'd need to do the
>> merge sort across those N posting lists?
>>
>> Such a thing might also be do-able as runtime only wrapper around the
>> postings API (FieldsProducer), if you could at runtime do the reverse
>> expansion (e.g. stem -> all of its surface forms).
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Jun 6, 2013 at 3:51 AM, Dmitry Kan 
>> wrote:
>>
>> > Robert Muir and I have discussed what Robert eventually named "postings
>> > lists deduplication" at bbuzz 2013 conference in Berlin.
>> >
>> > The idea is to allow multiple terms to point to the same postings list
>> to
>> > save space.
>> >
>> > The application / impact of this is positive for synonyms, exact /
>> inexact
>> > terms, leading wildcard support via storing reversed term etc.
>> >
>> > At the moment, when supporting exact (unstemmed) and inexact (stemmed)
>> > searches, we store both unstemmed and stemmed variant of a word form and
>> > that leads to index bloating. For example, we had to remove the leading
>> > wildcard support via reversing a token on index and query time because
>> of
>> > the same index size considerations.
>> >
>> > Would you like a jira for this?
>> >
>> > Thanks,
>> >
>> > Dmitry Kan
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

Re: postings lists deduplication

2013-06-06 Thread Dmitry Kan

Thanks!

Yes, it could be that allowing single term to point to several posting
lists is good e.g. for synonyms. So that there would be a single entry
point for one synonym (term) of the synonym set and it would find all doc
ids where synonyms of the entry point occur. Or is it being done like this
already?

For the exact / inexact matching, the implementation we have now would
suggest all surface forms occurred in the doc corpus of a word and its stem
to be pointing to a single posting list. Which potentially makes the
inverted index more compact. But maybe maintaining N lists + mergesort is
faster?

For the reverse expansion idea, which I personally like as well, we could


2013/6/6 Michael McCandless 

> Neat idea!
>
> Would this idea allow a single term to point to (the union of) N other
> posting lists?  It seems like that's necessary e.g. to handle the
> exact/inexact case.
>
> And then, to produce the Docs/AndPositionsEnum you'd need to do the
> merge sort across those N posting lists?
>
> Such a thing might also be do-able as runtime only wrapper around the
> postings API (FieldsProducer), if you could at runtime do the reverse
> expansion (e.g. stem -> all of its surface forms).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jun 6, 2013 at 3:51 AM, Dmitry Kan 
> wrote:
>
> > Robert Muir and I have discussed what Robert eventually named "postings
> > lists deduplication" at bbuzz 2013 conference in Berlin.
> >
> > The idea is to allow multiple terms to point to the same postings list to
> > save space.
> >
> > The application / impact of this is positive for synonyms, exact /
> inexact
> > terms, leading wildcard support via storing reversed term etc.
> >
> > At the moment, when supporting exact (unstemmed) and inexact (stemmed)
> > searches, we store both unstemmed and stemmed variant of a word form and
> > that leads to index bloating. For example, we had to remove the leading
> > wildcard support via reversing a token on index and query time because of
> > the same index size considerations.
> >
> > Would you like a jira for this?
> >
> > Thanks,
> >
> > Dmitry Kan
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

postings lists deduplication

2013-06-06 Thread Dmitry Kan

Robert Muir and I have discussed what Robert eventually named "postings
lists deduplication" at bbuzz 2013 conference in Berlin.

The idea is to allow multiple terms to point to the same postings list to
save space.

The application / impact of this is positive for synonyms, exact / inexact
terms, leading wildcard support via storing reversed term etc.

At the moment, when supporting exact (unstemmed) and inexact (stemmed)
searches, we store both unstemmed and stemmed variant of a word form and
that leads to index bloating. For example, we had to remove the leading
wildcard support via reversing a token on index and query time because of
the same index size considerations.

Would you like a jira for this?

Thanks,

Dmitry Kan

SOLR-4903 and SOLR-4904

2013-06-06 Thread Dmitry Kan

Hi guys,

As discussed with Grant and Andrzej I have created two jiras related to
inefficiency in distributed faceting. This affects 3.4, but my gut feeling
is telling me 4.x is affected as well.

Regards,

Dmitry Kan

P.S. Asking this question won yours truly second prize on Stump the chump.
:)

[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2013-06-05 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Affects Version/s: 4.3

> Solr sends all doc ids to all shards in the query counting facets
> -
>
> Key: SOLR-4903
> URL: https://issues.apache.org/jira/browse/SOLR-4903
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.4, 4.3
>    Reporter: Dmitry Kan
>
> Setup: front end solr and shards.
> Summary: solr frontend sends all doc ids received from QueryComponent to all 
> shards which causes POST request buffer size overflow.
> Symptoms:
> The query is: http://pastebin.com/0DndK1Cs
> I have omitted the shards parameter.
> The router log: http://pastebin.com/FTVH1WF3
> Notice the port of a shard, that is affected. That port changes all the time, 
> even for the same request
> The log entry is prepended with lines:
> SEVERE: org.apache.solr.common.SolrException: Internal Server Error
> Internal Server Error
> (they are not in the pastebin link)
> The shard log: http://pastebin.com/exwCx3LX
> Suggestion: change the data structure in FacetComponent to send only doc ids 
> that belong to a shard and not a concatenation of all doc ids.
> Why is this important: for scaling. Adding more shards will result in 
> overflowing the POST request buffer at some point anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4904) Send internal doc ids and index version in distributed faceting to make queries more compact

2013-06-05 Thread Dmitry Kan (JIRA)

Dmitry Kan created SOLR-4904:


 Summary: Send internal doc ids and index version in distributed 
faceting to make queries more compact
 Key: SOLR-4904
 URL: https://issues.apache.org/jira/browse/SOLR-4904
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.3, 3.4
Reporter: Dmitry Kan


This is suggested by [~ab] at bbuzz conf 2013. This makes a lot of sense and 
works nice with fixing the root cause of issue SOLR-4903.

Basically QueryComponent could send internal lucene ids along with the index 
version number so that in subsequent queries to other solr components, like 
FacetComponent, the internal ids would be sent. The index version is required 
to ensure we deal with the same index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2013-06-05 Thread Dmitry Kan (JIRA)

Dmitry Kan created SOLR-4903:


 Summary: Solr sends all doc ids to all shards in the query 
counting facets
 Key: SOLR-4903
 URL: https://issues.apache.org/jira/browse/SOLR-4903
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.4
Reporter: Dmitry Kan


Setup: front end solr and shards.

Summary: solr frontend sends all doc ids received from QueryComponent to all 
shards which causes POST request buffer size overflow.

Symptoms:

The query is: http://pastebin.com/0DndK1Cs
I have omitted the shards parameter.

The router log: http://pastebin.com/FTVH1WF3
Notice the port of a shard, that is affected. That port changes all the time, 
even for the same request
The log entry is prepended with lines:

SEVERE: org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

(they are not in the pastebin link)

The shard log: http://pastebin.com/exwCx3LX

Suggestion: change the data structure in FacetComponent to send only doc ids 
that belong to a shard and not a concatenation of all doc ids.

Why is this important: for scaling. Adding more shards will result in 
overflowing the POST request buffer at some point anyway.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2013-04-29 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644450#comment-13644450
 ] 

Dmitry Kan commented on SOLR-1726:
--

does the deep paging issue apply to facet paging?

> Deep Paging and Large Results Improvements
> --
>
> Key: SOLR-1726
> URL: https://issues.apache.org/jira/browse/SOLR-1726
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.3
>
> Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
> ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, 
> SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java
>
>
> There are possibly ways to improve collections of "deep paging" by passing 
> Solr/Lucene more information about the last page of results seen, thereby 
> saving priority queue operations.   See LUCENE-2215.
> There may also be better options for retrieving large numbers of rows at a 
> time that are worth exploring.  LUCENE-2127.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2013-02-20 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582207#comment-13582207
 ] 

Dmitry Kan commented on LUCENE-1486:


OK, after some study, here is what we did:

we treat the AND clauses as spanNearQuery objects. So, the

a AND b

becomes %a b%~slop, where %%~ operator is an unordered SpanNear query (change 
to QueryParser.jj was required for this).

When there is a case of NOT clause with nested clauses:

NOT( (a AND b) OR (c AND d) ) = NOT ( %a b%~slop OR %c d%~slop ) ,

we need to handle SpanNearQueries in the addComplexPhraseClause method. In 
order to handle this, we just added to the if statement:

[code]
if (qc instanceof BooleanQuery) {
[/code]

the following else if statement:

[code]
else if (childQuery instanceof SpanNearQuery) {
ors.add((SpanQuery)childQuery);
}
[/code]


> Wildcards, ORs etc inside Phrase queries
> 
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Affects Versions: 2.4
>Reporter: Mark Harwood
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: ComplexPhraseQueryParser.java, 
> junit_complex_phrase_qp_07_21_2009.patch, 
> junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
> field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of 
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in 
> QueryParser itself. This works as a proof of concept  for much of the query 
> parser syntax. Examples from the Junit test include:
>   checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies 
> are OK in phrases
>   checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic 
> works
>   checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic 
> works.
>   
>   checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a 
> phrase is bad
>   checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases 
> is bad
>   checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries 
> inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2013-02-18 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13580828#comment-13580828
 ] 

Dmitry Kan commented on LUCENE-1486:


Can someone give me a hand on this parser (despite the jira is so old)?

We need to have the NOT logic work properly in the boolean sense, that is the 
following should work correctly:

a AND NOT b
a AND NOT (b OR c)
a AND NOT ((b OR c) AND (d OR e))

Can anybody guide me here? Is it at all possible to accomplish this with this 
original CPQP implementation? I would not be afraid of changing QueryParser.jj 
lexical specification, if the task requires it.

> Wildcards, ORs etc inside Phrase queries
> 
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Affects Versions: 2.4
>Reporter: Mark Harwood
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: ComplexPhraseQueryParser.java, 
> junit_complex_phrase_qp_07_21_2009.patch, 
> junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
> field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of 
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in 
> QueryParser itself. This works as a proof of concept  for much of the query 
> parser syntax. Examples from the Junit test include:
>   checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies 
> are OK in phrases
>   checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic 
> works
>   checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic 
> works.
>   
>   checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a 
> phrase is bad
>   checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases 
> is bad
>   checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries 
> inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2013-01-18 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557053#comment-13557053
 ] 

Dmitry Kan commented on SOLR-1604:
--

Hello! Great work!

I have two questions:

1) What would it take to incorporate phrase searches into this extended query 
parser?
"\"a b\" c"~100
that is, "a b" (phrase search) is found in that order and exactly side by side 
<=100 tokens away from c.

2) does this implementation support the Boolean operators, like AND, OR, NOT 
(at least OR and NOT are supported as far as I can see)? Can they be nested?

> Wildcards, ORs etc inside Phrase Queries
> 
>
> Key: SOLR-1604
> URL: https://issues.apache.org/jira/browse/SOLR-1604
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers, search
>Affects Versions: 1.4
>Reporter: Ahmet Arslan
>Priority: Minor
> Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
> ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, ComplexPhrase.zip, 
> ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
> ComplexPhrase.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, 
> SOLR-1604.patch
>
>
> Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
> wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2013-01-16 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-1604:
-

Attachment: ComplexPhrase_solr_3.4.zip

This is ComplexPhrase project based on the version submitted on 21/Jul/11. It 
compiles and runs under solr 3.4. I have uncommented the tests in 
/org/apache/solr/search/ComplexPhraseQParserPluginTest.java and they passed.

> Wildcards, ORs etc inside Phrase Queries
> 
>
> Key: SOLR-1604
> URL: https://issues.apache.org/jira/browse/SOLR-1604
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers, search
>Affects Versions: 1.4
>Reporter: Ahmet Arslan
>Priority: Minor
> Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
> ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, ComplexPhrase.zip, 
> ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
> ComplexPhrase.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, 
> SOLR-1604.patch
>
>
> Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
> wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

1 - 100 of 114 matches

Mail list logo