Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-14 Thread Bernd Fehling
Your statement "using the old behaviour as a baseline for checking the
correctness of 5.5 behaviour" might be a point of view.

Let me give an example, my query:
q=(text:(star AND trek AND wars)^200 OR text:("star trek wars")^350)
results to 159 hits from 99 million records in the index (version 4.10.4).
I checked all 159 hits, they are correct.

The same query to the same indexed content build with 5.5.3 and also
having 99 million records results in 0 (zero) hits.

What do you think about this result?

By the way, after copying ExtendedDismaxQParser from 4.10.4 to 5.5.3 I get
now 137 hits. I really don't care about the difference, but at least
I get some hits out of 99 million records and they are correct.

Regards,
Bernd


Am 15.09.2016 um 01:41 schrieb Greg Pendlebury:
> I'm sorry that's been your experience Bernd. If you do manage to find some
> time it would be good to see some details on these bugs. It looks at the
> moment as though this is a matter of perception when using the old
> behaviour as a baseline for checking the correctness of 5.5 behaviour.
> 
> Ta,
> Greg
> 
> 
> On 15 September 2016 at 01:27, Erick Erickson 
> wrote:
> 
>> Perhaps https://issues.apache.org/jira/browse/SOLR-8812 and related?
>>
>> Best,
>> Erick
>>
>> On Tue, Sep 13, 2016 at 11:37 PM, Bernd Fehling
>>  wrote:
>>> Hi Greg,
>>>
>>> after trying several hours with all combinations of parameters and not
>>> getting any useful search result with complex search terms and edismax
>>> I finally copied o.a.s.s.ExtendedDismaxQParser.java from version 4.10.4
>>> to 5.5.3 and did a little modification in o.a.s.u.SolrPluginUtils.java.
>>>
>>> Now it is searching correct and getting logical and valid search results
>>> with any kind of complex search.
>>> Problem solved.
>>>
>>> But still, the edismax, at least of 5.5.3, has some bugs.
>>> If I get time I will look into this but right now my problem is solved
>>> and the customers and users are happy.
>>>
>>> I hope that this buggy edismax version is not used in solr 6.x otherwise
>> you
>>> have the same problems there.
>>>
>>> Regards
>>> Bernd
>>>
>>>
>>> Am 12.09.2016 um 05:10 schrieb Greg Pendlebury:
 Hi Bernd,

 "From my point of view the old parsing behavior was correct.
 If searching for a term without operator it is always OR, otherwise
 you can add "+" or "-" to modify that. Now with q.op AND it is
 modified to "+" as a MUST."

 It is correct in both cases. q.op dictates (for that query) what default
 operator to use when none is provided, and it is used as a priority over
 the system whole 'defaultOperator'. In either case, if you ask it to use
 OR, it uses it; if you ask it to use AND, it uses it. The behaviour from
 4.10 that was changed (arguably fixed, although I know that is a
>> debatable
 point) was that you asked it to use AND, and it ignored you
>> (irrespective
 of whether you used defaultOperator or q.op). The are a few subtle
 distinctions that are being missed (like the difference between the
>> boolean
 operators and the OCCURS flags that your are talking about), but they
>> are
 not going to change the outcome.

 8812 related to users who had been historically setting the q.op
>> parameter
 to influence the downstream default selection of 'mm' (If you don't
>> provide
 'mm' it is set for you based on 'q.op') instead of directly setting the
 'mm' value themselves. But again in this case, you're setting 'mm'
>> anyway,
 so it shouldn't be relevant.

 Ta,
 Greg

 On 9 September 2016 at 16:44, Bernd Fehling <
>> bernd.fehl...@uni-bielefeld.de>
 wrote:

> Hi Greg,
>
> thanks a lot, thats it.
> After setting q.op to OR it works _nearly_ as before with 4.10.4.
>
> But how stupid this?
> I have in my schema 
> and also had q.op to AND to make sure my default _is_ AND,
> meant as conjunction between terms.
> But now I have q.op to OR and defaultOperator in schema to AND
> to just get _nearly_ my old behavior back.
>
> schema has following comment:
> "... The default is OR, which is generally assumed so it is
> not a good idea to change it globally here.  The "q.op" request
> parameter takes precedence over this. ..."
>
> What I don't understand is why they change some major internals
> and don't give any notice about how to keep old parsing behavior.
>
> From my point of view the old parsing behavior was correct.
> If searching for a term without operator it is always OR, otherwise
> you can add "+" or "-" to modify that. Now with q.op AND it is
> modified to "+" as a MUST.
>
> I still get some differences in search results between 4.10.4 and
>> 5.5.3.
> What other side effects has this change of q.op from AND to OR in
> other parts of query handling, parsing and searching?
>
> Regards
> Bernd
>
> Am 09.09.2016 um 05:4

Re: (Survey/Experiment) Are you interested in a Solr example reading group?

2016-09-14 Thread tkg_cangkul

I'm very interesting with this. pls count me in too sir :D

Subject: 	Re: (Survey/Experiment) Are you interested in a Solr example 
reading group?

Date:   Wed, 14 Sep 2016 23:32:14 -0400
From:   John Blythe 
Reply-To:   solr-user@lucene.apache.org
To: solr-user 



i'd love to be a part. in a bit of a huge crunch tho at present so i'm not
certain how viable an option it will be for me in the near term.
conceptually tho i'm all for it.

--
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Wed, Sep 14, 2016 at 9:10 PM, Alexandre Rafalovitch 
wrote:


If we get enough people, there are several goals:
*) To help people to be less afraid of the examples and to understand
how to use them/learn from them, including sharing tools to
analyze/review the examples
*) Maybe to play with support tools (e.g. review tools) to find the
best way for people to learn
*) For me (and other committers who joined in) to see where people
have most problems
*) Discover the things in examples that may have stopped working and
we did not notice due to slow evolution or other factors
*) See what's missing in the examples
*) For myself, I am presenting at Solr Revolution on examples. So,
that's a way to both improve my presentation and to deepen my
understanding of what's possible/missing
*) Perhaps, just perhaps, thinking about new set of examples that
better reflect more recent features of Solr

But this all happens only if enough people show interest. Running an
"any questions allowed" is a lot of effort for the organizer(s),
so.. So far, for whatever reason, there has not been enough of
that interest.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 15 September 2016 at 02:43, Georg Sorst  wrote:
> Hi Alexandre,
>
> that's a great idea! Count me in (time permitting...).
>
> I guess the intended outcome is to create documentation issues and fixes?
>
> Best,
> Georg
>
> Alexandre Rafalovitch  schrieb am Di., 13. Sep. 2016
> 18:30:
>
>> Is anybody interested in joining an example reading group for Solr
>> (6.2 or latest).
>>
>> Basic idea: we take one of the examples that ship with Solr and ask
>> each other any and all questions related to it. Basic/beginner level
>> questions are allowed and welcomed. We could also share
>> tools/tips/ideas to make the examples easier to understand, etc.
>>
>> Examples of potentially interesting questions:
>> *) Is this text_rev actually doing anything?
>> *) Why does this search against the example not do anything?
>> *) How do I remove all comments from this example configuration?
>> *) Can I delete this field/type/config section and have the example
still
>> work?
>> *) Where is the documentation that makes "this" tick?
>> *) What would this example data look like if it were in XML/CSV/JSONL?
>> *) Is this a bug, a feature, or just me?
>>
>> This would be a separate time-bound group/list/slack (I am
>> open-to-suggestions), so only people interested and ready for
>> simple/narrow-focus questions be there.
>>
>> If you are interested (or even if not), I just setup a very basic
>> survey to give your opinion at: https://www.surveymonkey.com/r/JH8S666
>>
>> Regards,
>>Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>







Re: (Survey/Experiment) Are you interested in a Solr example reading group?

2016-09-14 Thread John Blythe
i'd love to be a part. in a bit of a huge crunch tho at present so i'm not
certain how viable an option it will be for me in the near term.
conceptually tho i'm all for it.

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Wed, Sep 14, 2016 at 9:10 PM, Alexandre Rafalovitch 
wrote:

> If we get enough people, there are several goals:
> *) To help people to be less afraid of the examples and to understand
> how to use them/learn from them, including sharing tools to
> analyze/review the examples
> *) Maybe to play with support tools (e.g. review tools) to find the
> best way for people to learn
> *) For me (and other committers who joined in) to see where people
> have most problems
> *) Discover the things in examples that may have stopped working and
> we did not notice due to slow evolution or other factors
> *) See what's missing in the examples
> *) For myself, I am presenting at Solr Revolution on examples. So,
> that's a way to both improve my presentation and to deepen my
> understanding of what's possible/missing
> *) Perhaps, just perhaps, thinking about new set of examples that
> better reflect more recent features of Solr
>
> But this all happens only if enough people show interest. Running an
> "any questions allowed" is a lot of effort for the organizer(s),
> so.. So far, for whatever reason, there has not been enough of
> that interest.
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 15 September 2016 at 02:43, Georg Sorst  wrote:
> > Hi Alexandre,
> >
> > that's a great idea! Count me in (time permitting...).
> >
> > I guess the intended outcome is to create documentation issues and fixes?
> >
> > Best,
> > Georg
> >
> > Alexandre Rafalovitch  schrieb am Di., 13. Sep. 2016
> > 18:30:
> >
> >> Is anybody interested in joining an example reading group for Solr
> >> (6.2 or latest).
> >>
> >> Basic idea: we take one of the examples that ship with Solr and ask
> >> each other any and all questions related to it. Basic/beginner level
> >> questions are allowed and welcomed. We could also share
> >> tools/tips/ideas to make the examples easier to understand, etc.
> >>
> >> Examples of potentially interesting questions:
> >> *) Is this text_rev actually doing anything?
> >> *) Why does this search against the example not do anything?
> >> *) How do I remove all comments from this example configuration?
> >> *) Can I delete this field/type/config section and have the example
> still
> >> work?
> >> *) Where is the documentation that makes "this" tick?
> >> *) What would this example data look like if it were in XML/CSV/JSONL?
> >> *) Is this a bug, a feature, or just me?
> >>
> >> This would be a separate time-bound group/list/slack (I am
> >> open-to-suggestions), so only people interested and ready for
> >> simple/narrow-focus questions be there.
> >>
> >> If you are interested (or even if not), I just setup a very basic
> >> survey to give your opinion at: https://www.surveymonkey.com/r/JH8S666
> >>
> >> Regards,
> >>Alex.
> >> 
> >> Newsletter and resources for Solr beginners and intermediates:
> >> http://www.solr-start.com/
> >>
>


Re: (Survey/Experiment) Are you interested in a Solr example reading group?

2016-09-14 Thread Alexandre Rafalovitch
If we get enough people, there are several goals:
*) To help people to be less afraid of the examples and to understand
how to use them/learn from them, including sharing tools to
analyze/review the examples
*) Maybe to play with support tools (e.g. review tools) to find the
best way for people to learn
*) For me (and other committers who joined in) to see where people
have most problems
*) Discover the things in examples that may have stopped working and
we did not notice due to slow evolution or other factors
*) See what's missing in the examples
*) For myself, I am presenting at Solr Revolution on examples. So,
that's a way to both improve my presentation and to deepen my
understanding of what's possible/missing
*) Perhaps, just perhaps, thinking about new set of examples that
better reflect more recent features of Solr

But this all happens only if enough people show interest. Running an
"any questions allowed" is a lot of effort for the organizer(s),
so.. So far, for whatever reason, there has not been enough of
that interest.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 15 September 2016 at 02:43, Georg Sorst  wrote:
> Hi Alexandre,
>
> that's a great idea! Count me in (time permitting...).
>
> I guess the intended outcome is to create documentation issues and fixes?
>
> Best,
> Georg
>
> Alexandre Rafalovitch  schrieb am Di., 13. Sep. 2016
> 18:30:
>
>> Is anybody interested in joining an example reading group for Solr
>> (6.2 or latest).
>>
>> Basic idea: we take one of the examples that ship with Solr and ask
>> each other any and all questions related to it. Basic/beginner level
>> questions are allowed and welcomed. We could also share
>> tools/tips/ideas to make the examples easier to understand, etc.
>>
>> Examples of potentially interesting questions:
>> *) Is this text_rev actually doing anything?
>> *) Why does this search against the example not do anything?
>> *) How do I remove all comments from this example configuration?
>> *) Can I delete this field/type/config section and have the example still
>> work?
>> *) Where is the documentation that makes "this" tick?
>> *) What would this example data look like if it were in XML/CSV/JSONL?
>> *) Is this a bug, a feature, or just me?
>>
>> This would be a separate time-bound group/list/slack (I am
>> open-to-suggestions), so only people interested and ready for
>> simple/narrow-focus questions be there.
>>
>> If you are interested (or even if not), I just setup a very basic
>> survey to give your opinion at: https://www.surveymonkey.com/r/JH8S666
>>
>> Regards,
>>Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>


Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-14 Thread Yonik Seeley
On Sun, Sep 11, 2016 at 11:29 PM, Greg Pendlebury
 wrote:
> I'm not certain what is going on with your boost. It doesn't seem related
> to those tickets as far as I can see, but I note it comes back in the
> 'parsedquery_toString' step below that. Perhaps the debug output has a
> display bug?

Yeah, it's likely a display bug.  A change in Lucene from per-query
boosts to BoostQuery caused some issues (refactoring bugs):
https://issues.apache.org/jira/browse/LUCENE-6590

-Yonik


Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-14 Thread Greg Pendlebury
I'm sorry that's been your experience Bernd. If you do manage to find some
time it would be good to see some details on these bugs. It looks at the
moment as though this is a matter of perception when using the old
behaviour as a baseline for checking the correctness of 5.5 behaviour.

Ta,
Greg


On 15 September 2016 at 01:27, Erick Erickson 
wrote:

> Perhaps https://issues.apache.org/jira/browse/SOLR-8812 and related?
>
> Best,
> Erick
>
> On Tue, Sep 13, 2016 at 11:37 PM, Bernd Fehling
>  wrote:
> > Hi Greg,
> >
> > after trying several hours with all combinations of parameters and not
> > getting any useful search result with complex search terms and edismax
> > I finally copied o.a.s.s.ExtendedDismaxQParser.java from version 4.10.4
> > to 5.5.3 and did a little modification in o.a.s.u.SolrPluginUtils.java.
> >
> > Now it is searching correct and getting logical and valid search results
> > with any kind of complex search.
> > Problem solved.
> >
> > But still, the edismax, at least of 5.5.3, has some bugs.
> > If I get time I will look into this but right now my problem is solved
> > and the customers and users are happy.
> >
> > I hope that this buggy edismax version is not used in solr 6.x otherwise
> you
> > have the same problems there.
> >
> > Regards
> > Bernd
> >
> >
> > Am 12.09.2016 um 05:10 schrieb Greg Pendlebury:
> >> Hi Bernd,
> >>
> >> "From my point of view the old parsing behavior was correct.
> >> If searching for a term without operator it is always OR, otherwise
> >> you can add "+" or "-" to modify that. Now with q.op AND it is
> >> modified to "+" as a MUST."
> >>
> >> It is correct in both cases. q.op dictates (for that query) what default
> >> operator to use when none is provided, and it is used as a priority over
> >> the system whole 'defaultOperator'. In either case, if you ask it to use
> >> OR, it uses it; if you ask it to use AND, it uses it. The behaviour from
> >> 4.10 that was changed (arguably fixed, although I know that is a
> debatable
> >> point) was that you asked it to use AND, and it ignored you
> (irrespective
> >> of whether you used defaultOperator or q.op). The are a few subtle
> >> distinctions that are being missed (like the difference between the
> boolean
> >> operators and the OCCURS flags that your are talking about), but they
> are
> >> not going to change the outcome.
> >>
> >> 8812 related to users who had been historically setting the q.op
> parameter
> >> to influence the downstream default selection of 'mm' (If you don't
> provide
> >> 'mm' it is set for you based on 'q.op') instead of directly setting the
> >> 'mm' value themselves. But again in this case, you're setting 'mm'
> anyway,
> >> so it shouldn't be relevant.
> >>
> >> Ta,
> >> Greg
> >>
> >> On 9 September 2016 at 16:44, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de>
> >> wrote:
> >>
> >>> Hi Greg,
> >>>
> >>> thanks a lot, thats it.
> >>> After setting q.op to OR it works _nearly_ as before with 4.10.4.
> >>>
> >>> But how stupid this?
> >>> I have in my schema 
> >>> and also had q.op to AND to make sure my default _is_ AND,
> >>> meant as conjunction between terms.
> >>> But now I have q.op to OR and defaultOperator in schema to AND
> >>> to just get _nearly_ my old behavior back.
> >>>
> >>> schema has following comment:
> >>> "... The default is OR, which is generally assumed so it is
> >>> not a good idea to change it globally here.  The "q.op" request
> >>> parameter takes precedence over this. ..."
> >>>
> >>> What I don't understand is why they change some major internals
> >>> and don't give any notice about how to keep old parsing behavior.
> >>>
> >>> From my point of view the old parsing behavior was correct.
> >>> If searching for a term without operator it is always OR, otherwise
> >>> you can add "+" or "-" to modify that. Now with q.op AND it is
> >>> modified to "+" as a MUST.
> >>>
> >>> I still get some differences in search results between 4.10.4 and
> 5.5.3.
> >>> What other side effects has this change of q.op from AND to OR in
> >>> other parts of query handling, parsing and searching?
> >>>
> >>> Regards
> >>> Bernd
> >>>
> >>> Am 09.09.2016 um 05:43 schrieb Greg Pendlebury:
>  I forgot to mention the tickets:
>  SOLR-2649 and SOLR-8812
> 
>  On 9 September 2016 at 13:38, Greg Pendlebury <
> greg.pendleb...@gmail.com
> 
>  wrote:
> 
> > Under 4.10 q.op was ignored by the edismax parser and always forced
> to
> >>> OR.
> > 5.5 is looking at the q.op=AND you requested.
> >
> > There are also some changes to the default values selected for mm,
> but I
> > doubt those apply here since you are setting it explicitly.
> >
> > On 8 September 2016 at 00:35, Mikhail Khludnev 
> wrote:
> >
> >> I suppose
> >>+((text:star
> text:trek)~2)
> >> and
> >>   +(+text:star +text:trek)
> >> are equal. mm=2 is equal to +foo +bar
> >>
> >> On Wed, Sep 7, 2016 at 10:52 AM, Bernd Fehling <

Re: SQL Joins in Parallel SQL Interface

2016-09-14 Thread Joel Bernstein
Hi,

Parallel SQL does not yet support joins but Streaming Expressions does.

There are 4 types of aggregations in Streaming Expression currently. The
functions are:

facet: group by aggregations pushed down to the JSON facet API. Will not
work with joins.
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-facet

stats: aggregations without group by, pushed down to the stats component
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-stats

rollup: MapReduce group by aggregations which can be used with joins.
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-rollup

gatherNodes: graph query aggregations
https://cwiki.apache.org/confluence/display/solr/Graph+Traversal

Past discussion on rollup and joins:
http://lucene.472066.n3.nabble.com/Solr-6-Use-facet-with-Streaming-Expressions-LeftOuterJoin-td4290526.html



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Sep 14, 2016 at 6:19 PM, Aswath Srinivasan (TMS) <
aswath.sriniva...@toyota.com> wrote:

> Hello,
>
> I'm exploring the Parallel SQL. I don't see any SQL JOIN features
> available in the parallel SQL interface, in the documentation. Is it even
> possible to do SQL JOIN in the parallel SQL interface?
>
> Was looking at streaming expression but looks like facets are not possible
> with it. Not even count(*) kind of operations?
>
> Thank you,
> Aswath NS
>
>


SQL Joins in Parallel SQL Interface

2016-09-14 Thread Aswath Srinivasan (TMS)
Hello,

I'm exploring the Parallel SQL. I don't see any SQL JOIN features available in 
the parallel SQL interface, in the documentation. Is it even possible to do SQL 
JOIN in the parallel SQL interface?

Was looking at streaming expression but looks like facets are not possible with 
it. Not even count(*) kind of operations?

Thank you,
Aswath NS



Re: Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Pushkar Raste
Damn I didn't put comments in the ticket but replied to question " Is it
safe to upgrade an existing field to docvalues?" on the mailing list.

Check that out

On Sep 14, 2016 5:59 PM, "Pushkar Raste"  wrote:

> We experienced exact opposite issue on Solr 4.10
>
> Check my comments in https://issues.apache.org/jira/browse/SOLR-9437
>
> I am not sure if issue was fixed in Solr 6
>
> I do be interested in tracking down patch for this.
>
> On Sep 14, 2016 3:04 PM, "Erick Erickson"  wrote:
>
>> Weird indeed. Optimize _shouldn't_ be necessary if the index was
>> rebuilt from scratch after changing something like DV, but in a mixed
>> set of segments I'm not sure what would happen. Perhaps one of the
>> Lucene folks can chime in?
>>
>> Best,
>> Erick
>>
>> On Wed, Sep 14, 2016 at 9:22 AM, Markus Jelsma
>>  wrote:
>> > Well, it could be that indeed. I know i enabled docValues on that field
>> three and a half months ago. But usually when i do that, i force an
>> optimize.
>> >
>> > On the other hand, i'd reckon that in the past few months, all segments
>> should have been merged with another one at least once because data keeps
>> streaming in. But i'm not sure it would anyway.
>> >
>> > Thanks,
>> > Markus
>> >
>> > -Original message-
>> >> From:Erick Erickson 
>> >> Sent: Wednesday 14th September 2016 17:22
>> >> To: solr-user 
>> >> Subject: Re: Facetting on a field doesn't work, until i optimized the
>> index
>> >>
>> >> That's strange
>> >>
>> >> Is there any chance that the schema changed? This is _really_ a shot
>> >> in the dark, but perhaps the optimize "normalized" the field
>> >> definitions stored with each segment.
>> >>
>> >> Imagine segments 1-5 have one definition, and segments 6-10 have a
>> >> different definition for your field. Optimize would have to resolve
>> >> this somehow, perhaps that process made the magic happen?
>> >>
>> >> NOTE: I'm not conversant with the internals of merge, so this may be
>> >> totally bogus..
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Wed, Sep 14, 2016 at 6:15 AM, Markus Jelsma
>> >>  wrote:
>> >> > Hello - we've just spotted the weirdest issue on Solr 6.1.
>> >> >
>> >> > We have a Solr index full of logs, new items are added every few
>> minutes. We also have an application that shows charts based on what's in
>> the index, Banana style.
>> >> >
>> >> > Yesterday we saw facets for a specific field were missing. Today we
>> checked it out until we reduced the facet query just to
>> facet=true&facet.field=FIELD, but it returned nothing of use, just an empty
>> set of facets.
>> >> >
>> >> > My colleague suggested the crazy idea to optimize the index, i
>> protested because it is no use, numDoc always equals maxDoc and the
>> optimize button was missing anyway. So i forced an optimize via the URL,
>> and it worked, the facets for that field are now back!
>> >> >
>> >> > Any ideas? Is there a related ticket?
>> >> >
>> >> > Thanks,
>> >> > Markus
>> >>
>>
>


Re: Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Pushkar Raste
We experienced exact opposite issue on Solr 4.10

Check my comments in https://issues.apache.org/jira/browse/SOLR-9437

I am not sure if issue was fixed in Solr 6

I do be interested in tracking down patch for this.

On Sep 14, 2016 3:04 PM, "Erick Erickson"  wrote:

> Weird indeed. Optimize _shouldn't_ be necessary if the index was
> rebuilt from scratch after changing something like DV, but in a mixed
> set of segments I'm not sure what would happen. Perhaps one of the
> Lucene folks can chime in?
>
> Best,
> Erick
>
> On Wed, Sep 14, 2016 at 9:22 AM, Markus Jelsma
>  wrote:
> > Well, it could be that indeed. I know i enabled docValues on that field
> three and a half months ago. But usually when i do that, i force an
> optimize.
> >
> > On the other hand, i'd reckon that in the past few months, all segments
> should have been merged with another one at least once because data keeps
> streaming in. But i'm not sure it would anyway.
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> >> From:Erick Erickson 
> >> Sent: Wednesday 14th September 2016 17:22
> >> To: solr-user 
> >> Subject: Re: Facetting on a field doesn't work, until i optimized the
> index
> >>
> >> That's strange
> >>
> >> Is there any chance that the schema changed? This is _really_ a shot
> >> in the dark, but perhaps the optimize "normalized" the field
> >> definitions stored with each segment.
> >>
> >> Imagine segments 1-5 have one definition, and segments 6-10 have a
> >> different definition for your field. Optimize would have to resolve
> >> this somehow, perhaps that process made the magic happen?
> >>
> >> NOTE: I'm not conversant with the internals of merge, so this may be
> >> totally bogus..
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Sep 14, 2016 at 6:15 AM, Markus Jelsma
> >>  wrote:
> >> > Hello - we've just spotted the weirdest issue on Solr 6.1.
> >> >
> >> > We have a Solr index full of logs, new items are added every few
> minutes. We also have an application that shows charts based on what's in
> the index, Banana style.
> >> >
> >> > Yesterday we saw facets for a specific field were missing. Today we
> checked it out until we reduced the facet query just to
> facet=true&facet.field=FIELD, but it returned nothing of use, just an empty
> set of facets.
> >> >
> >> > My colleague suggested the crazy idea to optimize the index, i
> protested because it is no use, numDoc always equals maxDoc and the
> optimize button was missing anyway. So i forced an optimize via the URL,
> and it worked, the facets for that field are now back!
> >> >
> >> > Any ideas? Is there a related ticket?
> >> >
> >> > Thanks,
> >> > Markus
> >>
>


Re: Miserable Experience Using Solr. Again.

2016-09-14 Thread Gus Heck
While stack overflow is a great place, and the more good info that exists
there, the merrier, I think Solr should have it's own complete docs, in
addition to anything found on 3rd party sites. Each hop to a new location
is a chance for the user to get lost, and the content on 3rd party sites
could be wrong, out of date without folks here being aware of it as
quickly.

Also, I just noticed that there seem to be some links at the bottom of the
admin UI, but they often run off the bottom and can be easily missed. The
"documentation" link doesn't actually lead to the documentation... Maybe
those should be along the top where they would be easily seen and the
link(s?) fixed up?

-Gus

On Wed, Sep 14, 2016 at 3:27 PM, Jan Høydahl  wrote:

> > If you could decide, what kind of documentation would you want from the
> project? A very short “Solr Quick start guide”? with step-by-step
> instructions for the most common tasks from a User perspective?
>
> I just became aware of StackOverflow’s Documentation project, which also
> has a solr topic:
> http://stackoverflow.com/documentation/solr  documentation/solr>
> Perhaps that could also be a good place to contribute HOWTOs and more
> end-user focused docs?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>


-- 
http://www.the111shift.com


Re: (Survey/Experiment) Are you interested in a Solr example reading group?

2016-09-14 Thread Georg Sorst
Hi Alexandre,

that's a great idea! Count me in (time permitting...).

I guess the intended outcome is to create documentation issues and fixes?

Best,
Georg

Alexandre Rafalovitch  schrieb am Di., 13. Sep. 2016
18:30:

> Is anybody interested in joining an example reading group for Solr
> (6.2 or latest).
>
> Basic idea: we take one of the examples that ship with Solr and ask
> each other any and all questions related to it. Basic/beginner level
> questions are allowed and welcomed. We could also share
> tools/tips/ideas to make the examples easier to understand, etc.
>
> Examples of potentially interesting questions:
> *) Is this text_rev actually doing anything?
> *) Why does this search against the example not do anything?
> *) How do I remove all comments from this example configuration?
> *) Can I delete this field/type/config section and have the example still
> work?
> *) Where is the documentation that makes "this" tick?
> *) What would this example data look like if it were in XML/CSV/JSONL?
> *) Is this a bug, a feature, or just me?
>
> This would be a separate time-bound group/list/slack (I am
> open-to-suggestions), so only people interested and ready for
> simple/narrow-focus questions be there.
>
> If you are interested (or even if not), I just setup a very basic
> survey to give your opinion at: https://www.surveymonkey.com/r/JH8S666
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>


Re: Miserable Experience Using Solr. Again.

2016-09-14 Thread Jan Høydahl
> If you could decide, what kind of documentation would you want from the 
> project? A very short “Solr Quick start guide”? with step-by-step 
> instructions for the most common tasks from a User perspective?

I just became aware of StackOverflow’s Documentation project, which also has a 
solr topic:
http://stackoverflow.com/documentation/solr 

Perhaps that could also be a good place to contribute HOWTOs and more end-user 
focused docs?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



Re: Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Erick Erickson
Weird indeed. Optimize _shouldn't_ be necessary if the index was
rebuilt from scratch after changing something like DV, but in a mixed
set of segments I'm not sure what would happen. Perhaps one of the
Lucene folks can chime in?

Best,
Erick

On Wed, Sep 14, 2016 at 9:22 AM, Markus Jelsma
 wrote:
> Well, it could be that indeed. I know i enabled docValues on that field three 
> and a half months ago. But usually when i do that, i force an optimize.
>
> On the other hand, i'd reckon that in the past few months, all segments 
> should have been merged with another one at least once because data keeps 
> streaming in. But i'm not sure it would anyway.
>
> Thanks,
> Markus
>
> -Original message-
>> From:Erick Erickson 
>> Sent: Wednesday 14th September 2016 17:22
>> To: solr-user 
>> Subject: Re: Facetting on a field doesn't work, until i optimized the index
>>
>> That's strange
>>
>> Is there any chance that the schema changed? This is _really_ a shot
>> in the dark, but perhaps the optimize "normalized" the field
>> definitions stored with each segment.
>>
>> Imagine segments 1-5 have one definition, and segments 6-10 have a
>> different definition for your field. Optimize would have to resolve
>> this somehow, perhaps that process made the magic happen?
>>
>> NOTE: I'm not conversant with the internals of merge, so this may be
>> totally bogus..
>>
>> Best,
>> Erick
>>
>> On Wed, Sep 14, 2016 at 6:15 AM, Markus Jelsma
>>  wrote:
>> > Hello - we've just spotted the weirdest issue on Solr 6.1.
>> >
>> > We have a Solr index full of logs, new items are added every few minutes. 
>> > We also have an application that shows charts based on what's in the 
>> > index, Banana style.
>> >
>> > Yesterday we saw facets for a specific field were missing. Today we 
>> > checked it out until we reduced the facet query just to 
>> > facet=true&facet.field=FIELD, but it returned nothing of use, just an 
>> > empty set of facets.
>> >
>> > My colleague suggested the crazy idea to optimize the index, i protested 
>> > because it is no use, numDoc always equals maxDoc and the optimize button 
>> > was missing anyway. So i forced an optimize via the URL, and it worked, 
>> > the facets for that field are now back!
>> >
>> > Any ideas? Is there a related ticket?
>> >
>> > Thanks,
>> > Markus
>>


RE: Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Markus Jelsma
Well, it could be that indeed. I know i enabled docValues on that field three 
and a half months ago. But usually when i do that, i force an optimize.

On the other hand, i'd reckon that in the past few months, all segments should 
have been merged with another one at least once because data keeps streaming 
in. But i'm not sure it would anyway.

Thanks,
Markus

-Original message-
> From:Erick Erickson 
> Sent: Wednesday 14th September 2016 17:22
> To: solr-user 
> Subject: Re: Facetting on a field doesn't work, until i optimized the index
> 
> That's strange
> 
> Is there any chance that the schema changed? This is _really_ a shot
> in the dark, but perhaps the optimize "normalized" the field
> definitions stored with each segment.
> 
> Imagine segments 1-5 have one definition, and segments 6-10 have a
> different definition for your field. Optimize would have to resolve
> this somehow, perhaps that process made the magic happen?
> 
> NOTE: I'm not conversant with the internals of merge, so this may be
> totally bogus..
> 
> Best,
> Erick
> 
> On Wed, Sep 14, 2016 at 6:15 AM, Markus Jelsma
>  wrote:
> > Hello - we've just spotted the weirdest issue on Solr 6.1.
> >
> > We have a Solr index full of logs, new items are added every few minutes. 
> > We also have an application that shows charts based on what's in the index, 
> > Banana style.
> >
> > Yesterday we saw facets for a specific field were missing. Today we checked 
> > it out until we reduced the facet query just to 
> > facet=true&facet.field=FIELD, but it returned nothing of use, just an empty 
> > set of facets.
> >
> > My colleague suggested the crazy idea to optimize the index, i protested 
> > because it is no use, numDoc always equals maxDoc and the optimize button 
> > was missing anyway. So i forced an optimize via the URL, and it worked, the 
> > facets for that field are now back!
> >
> > Any ideas? Is there a related ticket?
> >
> > Thanks,
> > Markus
> 


Re: Solr on HDFS: adding a shard replica

2016-09-14 Thread Erick Erickson
The core_node name is largely irrelevant, you should have names more
descriptive in the state.json file like collection1_shard1_replica1.
You happen to see 19 because you have only one replica per shard,

Exactly how are you creating the replica? What version of Solr? If
you're using the "core admin" UI, it's tricky to get right. I'd
strongly recommend using the "collections API, ADDREPLICA" command,
see: 
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica

Best,
Erick

On Tue, Sep 13, 2016 at 7:11 PM, Chetas Joshi  wrote:
> Is this happening because I have set replicationFactor=1?
> So even if I manually add replica for the shard that's down, it will just
> create a dataDir but would not copy any of the data into the dataDir?
>
> On Tue, Sep 13, 2016 at 6:07 PM, Chetas Joshi 
> wrote:
>
>> Hi,
>>
>> I just started experimenting with solr cloud.
>>
>> I have a solr cloud of 20 nodes. I have one collection with 18 shards
>> running on 18 different nodes with replication factor=1.
>>
>> When one of my shards goes down, I create a replica using the Solr UI. On
>> HDFS I see a core getting added. But the data (index table and tlog)
>> information does not get copied over to that directory. For example, on
>> HDFS I have
>>
>> /solr/collection/core_node_1/data/index
>> /solr/collection/core_node_1/data/tlog
>>
>> when I create a replica of a shard, it creates
>>
>> /solr/collection/core_node_19/data/index
>> /solr/collection/core_node_19/data/tlog
>>
>> (core_node_19 as I already have 18 shards for the collection). The issue
>> is both my folders  core_node_19/data/index and core_node_19/data/tlog are
>> empty. Data does not get copied over from core_node_1/data/index and
>> core_node_1/data/tlog.
>>
>> I need to remove core_node_1 and just keep core_node_19 (the replica). Why
>> the data is not getting copied over? Do I need to manually move all the
>> data from one folder to the other?
>>
>> Thank you,
>> Chetas.
>>
>>


Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-14 Thread Erick Erickson
Perhaps https://issues.apache.org/jira/browse/SOLR-8812 and related?

Best,
Erick

On Tue, Sep 13, 2016 at 11:37 PM, Bernd Fehling
 wrote:
> Hi Greg,
>
> after trying several hours with all combinations of parameters and not
> getting any useful search result with complex search terms and edismax
> I finally copied o.a.s.s.ExtendedDismaxQParser.java from version 4.10.4
> to 5.5.3 and did a little modification in o.a.s.u.SolrPluginUtils.java.
>
> Now it is searching correct and getting logical and valid search results
> with any kind of complex search.
> Problem solved.
>
> But still, the edismax, at least of 5.5.3, has some bugs.
> If I get time I will look into this but right now my problem is solved
> and the customers and users are happy.
>
> I hope that this buggy edismax version is not used in solr 6.x otherwise you
> have the same problems there.
>
> Regards
> Bernd
>
>
> Am 12.09.2016 um 05:10 schrieb Greg Pendlebury:
>> Hi Bernd,
>>
>> "From my point of view the old parsing behavior was correct.
>> If searching for a term without operator it is always OR, otherwise
>> you can add "+" or "-" to modify that. Now with q.op AND it is
>> modified to "+" as a MUST."
>>
>> It is correct in both cases. q.op dictates (for that query) what default
>> operator to use when none is provided, and it is used as a priority over
>> the system whole 'defaultOperator'. In either case, if you ask it to use
>> OR, it uses it; if you ask it to use AND, it uses it. The behaviour from
>> 4.10 that was changed (arguably fixed, although I know that is a debatable
>> point) was that you asked it to use AND, and it ignored you (irrespective
>> of whether you used defaultOperator or q.op). The are a few subtle
>> distinctions that are being missed (like the difference between the boolean
>> operators and the OCCURS flags that your are talking about), but they are
>> not going to change the outcome.
>>
>> 8812 related to users who had been historically setting the q.op parameter
>> to influence the downstream default selection of 'mm' (If you don't provide
>> 'mm' it is set for you based on 'q.op') instead of directly setting the
>> 'mm' value themselves. But again in this case, you're setting 'mm' anyway,
>> so it shouldn't be relevant.
>>
>> Ta,
>> Greg
>>
>> On 9 September 2016 at 16:44, Bernd Fehling 
>> wrote:
>>
>>> Hi Greg,
>>>
>>> thanks a lot, thats it.
>>> After setting q.op to OR it works _nearly_ as before with 4.10.4.
>>>
>>> But how stupid this?
>>> I have in my schema 
>>> and also had q.op to AND to make sure my default _is_ AND,
>>> meant as conjunction between terms.
>>> But now I have q.op to OR and defaultOperator in schema to AND
>>> to just get _nearly_ my old behavior back.
>>>
>>> schema has following comment:
>>> "... The default is OR, which is generally assumed so it is
>>> not a good idea to change it globally here.  The "q.op" request
>>> parameter takes precedence over this. ..."
>>>
>>> What I don't understand is why they change some major internals
>>> and don't give any notice about how to keep old parsing behavior.
>>>
>>> From my point of view the old parsing behavior was correct.
>>> If searching for a term without operator it is always OR, otherwise
>>> you can add "+" or "-" to modify that. Now with q.op AND it is
>>> modified to "+" as a MUST.
>>>
>>> I still get some differences in search results between 4.10.4 and 5.5.3.
>>> What other side effects has this change of q.op from AND to OR in
>>> other parts of query handling, parsing and searching?
>>>
>>> Regards
>>> Bernd
>>>
>>> Am 09.09.2016 um 05:43 schrieb Greg Pendlebury:
 I forgot to mention the tickets:
 SOLR-2649 and SOLR-8812

 On 9 September 2016 at 13:38, Greg Pendlebury >>>
 wrote:

> Under 4.10 q.op was ignored by the edismax parser and always forced to
>>> OR.
> 5.5 is looking at the q.op=AND you requested.
>
> There are also some changes to the default values selected for mm, but I
> doubt those apply here since you are setting it explicitly.
>
> On 8 September 2016 at 00:35, Mikhail Khludnev  wrote:
>
>> I suppose
>>+((text:star text:trek)~2)
>> and
>>   +(+text:star +text:trek)
>> are equal. mm=2 is equal to +foo +bar
>>
>> On Wed, Sep 7, 2016 at 10:52 AM, Bernd Fehling <
>> bernd.fehl...@uni-bielefeld.de> wrote:
>>
>>> Hi list,
>>>
>>> while going from SOLR 4.10.4 to 5.5.3 I noticed a change in query
>> parsing.
>>> 4.10.4
>>> text:star text:trek
>>>   text:star text:trek
>>>   (+((text:star text:trek)~2))/no_coord
>>>   +((text:star text:trek)~2)
>>>
>>> 5.5.3
>>> text:star text:trek
>>>   text:star text:trek
>>>   (+(+text:star +text:trek))/no_coord
>>>   +(+text:star +text:trek)
>>>
>>> There are very many new features and changes between this two
>>> versions.
>>> It looks like a change in query parsing.
>>> Can someone point me to the solr or luc

Re: requestlog jetty param in solr 5.x

2016-09-14 Thread Shawn Heisey
On 9/14/2016 9:09 AM, Rajesh Hazari wrote:
> solr version: 5.5.0
>
> I was checking to see if there is any quick solution for embedded jetty can
> log request access logs too.

In server/etc/jetty.xml, there is a commented configuration section that
creates a request log.  Just uncomment it and then restart Solr.

The following paste URL shows what the section looks like in the 6.0.0
config.  It would look virtually the same in the 5.5.0 config:

http://apaste.info/dr2

To uncomment it, remove the lines that are numbered 4 and 26 on that URL.

Thanks,
Shawn



Re: Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Erick Erickson
That's strange

Is there any chance that the schema changed? This is _really_ a shot
in the dark, but perhaps the optimize "normalized" the field
definitions stored with each segment.

Imagine segments 1-5 have one definition, and segments 6-10 have a
different definition for your field. Optimize would have to resolve
this somehow, perhaps that process made the magic happen?

NOTE: I'm not conversant with the internals of merge, so this may be
totally bogus..

Best,
Erick

On Wed, Sep 14, 2016 at 6:15 AM, Markus Jelsma
 wrote:
> Hello - we've just spotted the weirdest issue on Solr 6.1.
>
> We have a Solr index full of logs, new items are added every few minutes. We 
> also have an application that shows charts based on what's in the index, 
> Banana style.
>
> Yesterday we saw facets for a specific field were missing. Today we checked 
> it out until we reduced the facet query just to facet=true&facet.field=FIELD, 
> but it returned nothing of use, just an empty set of facets.
>
> My colleague suggested the crazy idea to optimize the index, i protested 
> because it is no use, numDoc always equals maxDoc and the optimize button was 
> missing anyway. So i forced an optimize via the URL, and it worked, the 
> facets for that field are now back!
>
> Any ideas? Is there a related ticket?
>
> Thanks,
> Markus


Re: Solr kerberos

2016-09-14 Thread Erick Erickson
There is not nearly enough information here to begin to answer your
question. You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick


On Wed, Sep 14, 2016 at 7:44 AM, selvakumar natarajan
 wrote:
> Team,
>
> I an trying to setup Solr 5.4.1 in our organization,  we have kerberos 
> enabled zookeeper . When I enable authentication in Solr , I am not able to 
> start Solr.  It complains no author for /oversees .
>
>
> Regards
>
> Selvakumar. N


requestlog jetty param in solr 5.x

2016-09-14 Thread Rajesh Hazari
Hi All,

solr version: 5.5.0

I was checking to see if there is any quick solution for embedded jetty can
log request access logs too.

After some googling i found documentation (here
)
that there is configuration option param --add-to-startd=requestlog which
it looks for and creates and access logs in {$jetty.base}

Initially i started using -a option in startup

for ex:
./solr restart -m 4g -s ${solr.home} -V -a "--add-to-startd=requestlog"  ,
which did not work
basically -a option is used only to pass solr params

Then i went on to change my ./solr script with
 SOLR_JETTY_CONFIG+=("--add-to-startd=requestlog")
.
 and added echo statement which prints the jetty config params(with
verbose option -V)

if [ "$SOLR_JETTY_CONFIG" != "" ]; then
  echo -e "SOLR_JETTY_CONFIG= ${SOLR_JETTY_CONFIG[@]}"
fi

when i restart the solr server
./solr restart -m 4g -s ${solr.home} -V
the above change prints the below jetty config params, but server does not
start.

can't we pass extra arguments to jetty server from solr start up script or
is there other place that we have to change?


*Thanks,*
*Rajesh**.*


Solr kerberos

2016-09-14 Thread selvakumar natarajan
Team,

I an trying to setup Solr 5.4.1 in our organization,  we have kerberos enabled 
zookeeper . When I enable authentication in Solr , I am not able to start Solr. 
 It complains no author for /oversees .


Regards

Selvakumar. N


JSON Facets and excluded tags - not working for empty results

2016-09-14 Thread Stefan Matheis
I’m not entirely sure i’m describing the correct problem here - for now it 
looks like the only way it occurs and i hope it’s not misleading any pointers 
that would be helpful. so in case you think i got it wrong, please say so

I have two documents in the index [{"source":"foo"}, {"source":"bar”}] where 
source is a simple string field (indexed as well as stored, if that’ll matter).

Using

> ?q=*:*
> &fq={!tag=source}source:"meh"
> &json.facet={"source":{"type":"terms","field":"source","domain":{"excludeTags":"source"}}}

where meh is a value that is not available for source, i get no results 
(expected) but no facets as well - which is rather unexpected to me. as soon as 
i go with source:”bar” (or something else that yields at least one record) i’m 
getting a record back and as well facets.

which is why i’ve started of with the idea that there might be a correlation 
between those things. verifying the situation using the old facet approach i 
always get the expected facets back, no matter if the result is empty or not.

Or isn’t it supposed to work like this anymore and i’m the guy who didn’t get 
the memo?

Thanks
Stefan



Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Markus Jelsma
Hello - we've just spotted the weirdest issue on Solr 6.1.

We have a Solr index full of logs, new items are added every few minutes. We 
also have an application that shows charts based on what's in the index, Banana 
style.

Yesterday we saw facets for a specific field were missing. Today we checked it 
out until we reduced the facet query just to facet=true&facet.field=FIELD, but 
it returned nothing of use, just an empty set of facets.

My colleague suggested the crazy idea to optimize the index, i protested 
because it is no use, numDoc always equals maxDoc and the optimize button was 
missing anyway. So i forced an optimize via the URL, and it worked, the facets 
for that field are now back!

Any ideas? Is there a related ticket?

Thanks,
Markus


Re: Miserable Experience Using Solr. Again.

2016-09-14 Thread Shawn Heisey
On 9/13/2016 5:42 PM, Aaron Greenspan wrote:
> I get this on digest mode (and wasn’t even sure my initial message
> went through to the list), so please forgive the delay in responding. 

I've added you as BCC so you'll get this as soon as I send it.  I wrote
most of it last night, and left it to complete in the morning -- and now
I see that Jan has replied with similar information.

> I think the various reactions to my post suggest that a sizable number
> of users (and by "users" I mean those who are not affiliated with
> Apache and who are not core contributors) find Solr difficult to use.
> For me, this was confirmed many months ago when a family friend—a
> non-technical CEO twice my age of a company recently acquired for a
> very sizable sum—came over for dinner and without any prompting from
> anyone began complaining about this impossible program at work called
> Solr that none of his engineers could get to work. By his telling, he
> had several experienced engineers working on it. 

I've been using Solr for about six years now.  When I first got started,
I spent a HUGE amount of time figuring out the most basic things, and I
asked plenty of dumb questions right here on this list.  I think it took
me about three days to get from that initial download of the 1.4.0
archive to a working server that had something besides "collection1" on
it.  It took another month or so beyond that before I could demonstrate
anything usable to my team, and after that had to start writing tools
that would actually create the index without manual intervention.  One
of those tools was an init script.  Now Solr will install an init script
on Unix-like operating systems.

My active production indexes are running on a couple of different 4.x
versions.  I have production 5.x indexes on servers serving a hot
standby role, but they have not been fully vetted, so the primaries
remain on older versions.  It'll be a while before I get around to 6.x.

> I’m aware that issues with Java are not Solr’s fault. But most
> programs still manage to gracefully fail when they are missing a
> dependency, and then clearly report what’s missing. If you’re not
> actually a Java programmer, which I am not, "major.minor 52.0" (for
> example) is meaningless gibberish. "Please download and install JRE
> 1.8 to run this software" would be considerably clearer. How is it
> that Solr can search through millions of files, but it can’t do that? 

I know that in the 5.x days, we had Java version detection in the start
script, so that the start would complain if certain buggy versions of
Java 7 were detected.  I think it would even refuse to start if the
version wasn't new enough.  If we have lost that with 6.x, that needs to
go back in, and we will look at that problem immediately.

On password security:  I hear you.  Part of the issue is that Solr can't
*directly* do security.  It's sitting behind another piece of software
that handles the network and HTTP -- Jetty.  Until recently, Solr really
didn't touch the servlet container, allowing it to do its thing
according to its config files.  Part of this was due to the fact that
before 5.0, we did not know what container was being used -- the user
had the option of deploying in several different containers, and none of
them handled security in quite the same way.  Since 5.0, the only
officially supported container is the Jetty that Solr includes, so we
CAN put container-specific code into Solr.  This is why 5.3 and later
have good support for authentication.

TL;DR info:  When you password protect Solr, the admin UI actually
doesn't get protected.  It is nothing more than static HTML, CSS,
Javascript, and images.  The admin UI actually runs in your browser, not
on the server.  What gets password protection is the HTTP API used for
information, queries, and updates.

You're absolutely right that our documentation and error messages are
completely inadequate for a novice user.  The error messages sometimes
aren't even adequate for an experienced Java developer to know what went
wrong, at least not without examining the source code.

> As for Bram Van Dam’s question about how a settings database would
> work, I don’t think it’s worth getting too specific here, but my
> general response would be, if you need a good model for how to widely
> deploy software—not a perfect model, but a good one—look at WordPress.
> A lot of people use WordPress. Like any software, it has its flaws.
> But average people are able to sign in, with a password (!), change
> their admin settings, and save those settings I’m pretty certain to a
> MySQL schema. I’d love to be able to do that with Solr. 

I concur with what Alexandre said about Wordpress compared to Solr.  The
target audience and deployment method are quite different ... but I take
your point too -- we can learn a lot from projects like WordPress, which
has had to address "first contact" issues in their documentation.

The addition of Zookeeper capability to Solr in versio

Re: help with field definition

2016-09-14 Thread Emir Arnautovic

Hi Gandham,

It seems to me that you need exact matches on singerName so it should be 
untokenized - use KeywordTokenizerFactory. If you want to make it case 
insensitive, add LowerCaseFilterFactory and that's for indexing.


Query analysis chain can use standard tokenizer, LowerCaseFilterFactory 
if want search to be case insensitive, and ShingleFilterFactory with 
shingles min to max number of name parts in singerName. Probably wrong 
assumption, but assuming FirstName LastName, you can use 
maxShingleSize=2 and in your first example it'll be: 'my fav', 'fav 
artist', 'artist justing', 'justin beiber'...


You can tweak this depending on other requirements.

HTH,
Emir


On 13.09.2016 23:27, Gandham, Satya wrote:

HI,

   I need help with defining a field ‘singerName’ with the right 
tokenizers and filters such that it gives me the below described behavior:

I have a few documents as given below:

Doc 1
   singerName: Justin Beiber
Doc 2:
   singerName: Justin Timberlake
…


Below is the list of quries and the corresponding matches:

Query 1: “My fav artist Justin Beiber is very impressive”
Docs Matched : Doc1

Query 2: “I have a Justin Timberlake poster on my wall”
Docs Matched: Doc2

Query 3: “The name Bieber Justin is unique”
Docs Matched: None

Query 4: “Timberlake is a lake of timber..?”
Docs Matched: None.

I have this described a bit more detailed here: 
http://stackoverflow.com/questions/39399321/solr-shingle-query-matching-keyword-tokenized-field

I’d appreciate any help in addressing this problem.

Thanks !!



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Miserable Experience Using Solr. Again.

2016-09-14 Thread Jan Høydahl
> 14. sep. 2016 kl. 01.42 skrev Aaron Greenspan :

First of all, thanks for spending some time to give feedback and opening JIRAs 
(even if some get closed because it is a question, not a bug report).
This list is exactly the right forum to bring up frustrations newbie users 
might have with Solr, and I think we should LISTEN carefully and identify low 
hanging fruits, as Alexandre is also focusing on!

> I’m aware that issues with Java are not Solr’s fault. But most programs still 
> manage to gracefully fail when they are missing a dependency, and then 
> clearly report what’s missing. If you’re not actually a Java programmer, 
> which I am not, "major.minor 52.0" (for example) is meaningless gibberish. 
> "Please download and install JRE 1.8 to run this software" would be 
> considerably clearer. How is it that Solr can search through millions of 
> files, but it can’t do that?

I agree that it would help big time if bin/solr would validate correct Java 
version. Found https://issues.apache.org/jira/browse/SOLR-8080 
 for this, will try to cook up 
something :) 
Also, over in https://issues.apache.org/jira/browse/SOLR-9508 
 I added a check for Java, so 
it will prompt you to install Java before you can install Solr. Should perhaps 
check for min-version here as well.

> 1. I did. The documentation is severely lacking, apparently having been 
> written by project contributors who have vastly different goals than their 
> users. 

Yea, the ref-guide is a huge beast and aims to list every single setting.
Then we have the tutorials that aim to walk new users through installing, 
indexing and searching. But they don’t cover upgrading etc of course.
Then of course you have all the books - which is perhaps the best option right 
now to get quickly up to speed..

If you could decide, what kind of documentation would you want from the 
project? A very short “Solr Quick start guide”? with step-by-step instructions 
for the most common tasks from a User perspective?

> Note the red section at the bottom (which originally wasn’t even there): "No 
> Solr API, including the Admin UI, is designed to be exposed to non-trusted 
> parties. Tune your firewall so that only trusted computers and people are 
> allowed access." If one of my employees tried to pull this I would fire them. 
> Admin UIs in every other product I’ve ever seen are password-protected. 
> Always. Netscape Enterprise Server in 1996 had a password for its admin UI.

The warning is an honest way to tell admins that Solr is not designed to be an 
internet-facing program, like httpd or nginx. That is not to say that you 
cannot secure Solr pretty well with what we already got, but there will 
probably be a bunch of security holes since an internet-facing service is not 
the goal of Solr. It is not either an excuse for not having a password 
protection that is easier to understand.

Still, the truth is that you CAN add authentication to all of Solr, including 
the UI. What is confusing though, is that the static (non-sensitive) parts of 
the UI will load and display, but as soon as the UI attempts to request any 
kind of information from the Solr APIs, it will fail.

In my opinion this is perceived by our users as the Admin UI being insecure, 
and even if technically not true, we should continue work on 
https://issues.apache.org/jira/browse/SOLR-7896 
 (Add a login page for Solr 
Administrative Interface). 

> 2. I have filed several reports on JIRA. Here’s the kind of response I have 
> received in the past:

Again, thanks for contributing all of this, without users who care and suggest 
stuff there would be no progress…
And I apologise if we as a community have not been understanding or had a 
welcoming attitude to the suggestions.

> https://issues.apache.org/jira/browse/SOLR-7896?focusedCommentId=14661324&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14661324

Security is a special case I would say, since the project had an “official” 
attitude that we’d not add any kind of security whatsoever, to not mislead 
people into exposing Solr publicly. But then things changed in 5.2 and all the 
new security stuff got no vetoes anymore, and we’re now in a pretty good 
position on the API level of security. Wrt SOLR-7896, I re-opened that one 
after it got prematurely closed. But apparently not enough users have felt the 
need for it for someone to invest the time or money it takes to implement it. 
And that’s how open source works, as I’m sure you know.

> Lastly, I run a non-profit foundation devoted to transparency, and I think 
> Solr could do a lot to help further my foundation's goals. That’s why I’m 
> using it at all. It’s the kind of project I’d be willing to fund (since I 
> don’t think I can write the code myself in this instance)—except that very 
> few people working on

Re: [Result Query Solr] How to retrieve the content of pdfs

2016-09-14 Thread Alexandre Rafalovitch
The extracted content goes into text field which is not stored. You can
make it stored but the output will really not be pretty. PDF is not a
linear storage format.

Regards,
Alex

On 14 Sep 2016 5:16 AM, "Alexandre Martins" 
wrote:

> Hi Guys,
>
> I'm trying to use the last version of solr and i have used the post tool to
> upload 28 pdf files and it works fine. However, I don't know how to show
> the content of the files in the resulted json. Anybody know how to include
> this field?
>
> "responseHeader":{ "zkConnected":true, "status":0, "QTime":43, "params":{
> "q
> ":"ABC", "indent":"on", "wt":"json", "_":"1473804420750"}}, "response":{"
> numFound":40,"start":0,"maxScore":9.1066065,"docs":[ { "id":
> "/home/alexandre/desenvolvimento/workspace/solr-6.2.0/pdfs_hack/abc.pdf",
> "
> date":["2016-09-13T14:44:17Z"], "pdf_pdfversion":[1.5],
> "xmp_creatortool":["PDFCreator
> Version 1.7.3"], "stream_content_type":["application/pdf"], "
> access_permission_modify_annotations":[false], "
> access_permission_can_print_degraded":[false], "dc_creator":["abc"], "
> dcterms_created":["2016-09-13T14:44:17Z"], "last_modified":[
> "2016-09-13T14:44:17Z"], "dcterms_modified":["2016-09-13T14:44:17Z"], "
> dc_format":["application/pdf; version=1.5"], "title":["ABC tittle"], "
> xmpmm_documentid":["uuid:100ccff2-7c1c-11e6--ab7b62fc46ae"], "
> last_save_date":["2016-09-13T14:44:17Z"], "access_permission_fill_in_
> form":[
> false], "meta_save_date":["2016-09-13T14:44:17Z"],
> "pdf_encrypted":[false],
> "dc_title":["Tittle abc"], "modified":["2016-09-13T14:44:17Z"], "
> content_type":["application/pdf"], "stream_size":[101948], "x_parsed_by":[
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.pdf.PDFParser"], "creator":["mauricio.tostes"], "
> meta_author":["mauricio.tostes"], "meta_creation_date":[
> "2016-09-13T14:44:17Z"], "created":["Tue Sep 13 14:44:17 UTC 2016"], "
> access_permission_extract_for_accessibility":[false], "
> access_permission_assemble_document":[false], "xmptpg_npages":[3], "
> creation_date":["2016-09-13T14:44:17Z"], "resourcename":[
> "/home/alexandre/desenvolvimento/workspace/solr-6.2.0/pdfs_hack/abc.pdf"],
> "
> access_permission_extract_content":[false], "access_permission_can_print":
> [
> false], "author":["abc.add"], "producer":["GPL Ghostscript 9.10"], "
> access_permission_can_modify":[false], "_version_":1545395897488113664},
>
> Alexandre Costa Martins
> DATAPREV - IT Analyst
> Software Reuse Researcher
> MSc Federal University of Pernambuco
> RiSE Member - http://www.rise.com.br
> Sun Certified Programmer for Java 5.0 (SCPJ5.0)
>
> MSN: xandecmart...@hotmail.com
> GTalk: alexandremart...@gmail.com
> Skype: xandecmartins
> Mobile: +55 (85) 9626-3631
>


Re: Unable to connect to correct port in solr 6.2.0

2016-09-14 Thread Jan Høydahl
Thanks for the description.

I updated https://issues.apache.org/jira/browse/SOLR-9475 
 for this, and my patch there 
now detects the guest distro, not the host.
The update-rc.d error is simply because of wrong detection of distro.

I also created https://issues.apache.org/jira/browse/SOLR-9508 
 to give better error msgs for 
missing /usr/sbin/service command

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 12. sep. 2016 kl. 22.28 skrev Kevin Risden :
> 
> Jan - the issue you are hitting is Docker and /proc/version is getting the
> underlying OS kernel and not what you would expect from the Docker
> container. The errors for update-rc.d and service are because the docker
> image you are using is trimmed down.
> 
> Kevin Risden
> 
> On Mon, Sep 12, 2016 at 3:19 PM, Jan Høydahl  wrote:
> 
>> I tried it on a Docker RHEL system (gidikern/rhel-oracle-jre) and the
>> install failed with errors
>> 
>> ./install_solr_service.sh: line 322: update-rc.d: command not found
>> ./install_solr_service.sh: line 326: service: command not found
>> ./install_solr_service.sh: line 328: service: command not found
>> 
>> Turns out that /proc/version returns “Ubuntu” this on the system:
>> Linux version 4.4.19-moby (root@3934ed318998) (gcc version 5.4.0 20160609
>> (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #1 SMP Thu Sep 1 09:44:30 UTC 2016
>> There is also a /etc/redhat-release file:
>> Red Hat Enterprise Linux Server release 7.1 (Maipo)
>> 
>> So the install of rc.d failed completely because of this. Don’t know if
>> this is common on RHEL systems, perhaps we need to improve distro detection
>> in installer?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 12. sep. 2016 kl. 21.31 skrev Shalin Shekhar Mangar <
>> shalinman...@gmail.com>:
>>> 
>>> I just tried this out on ubuntu (sorry I don't have access to a red hat
>>> system) and it works fine.
>>> 
>>> One thing that you have to take care of is that if you install the
>> service
>>> on the default 8983 port then, trying to upgrade with the same tar to a
>>> different port does not work. So please ensure that you hadn't already
>>> installed the service before already.
>>> 
>>> On Tue, Sep 13, 2016 at 12:53 AM, Shalin Shekhar Mangar <
>>> shalinman...@gmail.com> wrote:
>>> 
 Which version of red hat? Is lsof installed on this system?
 
 On Mon, Sep 12, 2016 at 4:30 PM, Preeti Bhat 
 wrote:
 
> HI All,
> 
> I am trying to setup the solr in Redhat Linux, using the
> install_solr_service.sh script of solr.6.2.0  tgz. The script runs and
> starts the solr on port 8983 even when the port is specifically
>> specified
> as 2016.
> 
> /root/install_solr_service.sh solr-6.2.0.tgz -i /opt -d /var/solr -u
>> root
> -s solr -p 2016
> 
> Is this correct way to setup solr in linux? Also, I have observed that
>> if
> I go to the /bin/solr and start with the port number its working as
> expected but not as service.
> 
> I would like to setup the SOLR in SOLRCloud mode with external
>> zookeepers.
> 
> Could someone please advise on this?
> 
> 
> 
> NOTICE TO RECIPIENTS: This communication may contain confidential
>> and/or
> privileged information. If you are not the intended recipient (or have
> received this communication in error) please notify the sender and
> it-supp...@shoregrp.com immediately, and destroy this communication.
>> Any
> unauthorized copying, disclosure or distribution of the material in
>> this
> communication is strictly forbidden. Any views or opinions presented in
> this email are solely those of the author and do not necessarily
>> represent
> those of the company. Finally, the recipient should check this email
>> and
> any attachments for the presence of viruses. The company accepts no
> liability for any damage caused by any virus transmitted by this email.
> 
> 
> 
 
 
 --
 Regards,
 Shalin Shekhar Mangar.
 
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>> 
>>