date:20180315

Re: LTR - OriginalScore query issue

2018-03-15 Thread ilayaraja

I do have the features defined as below for field specific (title..) matching
etc:

features: [
{
name: "productNewness",
class: "org.apache.solr.ltr.feature.SolrFeature",
params: {
q: "{!func}recip( ms(NOW,launchdate_pl), 3.16e-11, 1, 1)"
},
store: "myFeatureStoreDemo",
},
{
name: "originalScore",
class: "org.apache.solr.ltr.feature.OriginalScoreFeature",
params: null,
store: "myFeatureStoreDemo",
},
{
name: "productTitleMatchGuestQuery",
class: "org.apache.solr.ltr.feature.SolrFeature",
params: {
q: "{!dismax qf=p_title}${user_query}"
},
store: "myFeatureStoreDemo",
}
]



What I am looking for is the original score for that document given the
query (efi.user_query) before re-ranking i.e. as per the "qf" defined in the
request handler - that matches different fields with the query:
qf=item_typel^3.0 brand^2.0 title^5.0





-
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: statistics in hitlist

2018-03-15 Thread Joel Bernstein

I've been working on the user guide for the math expressions. Here is the
page on regression:

https://github.com/joel-bernstein/lucene-solr/blob/math_expressions_documentation/solr/solr-ref-guide/src/regression.adoc

This page is part of the larger math expression documentation. The TOC is
here:

https://github.com/joel-bernstein/lucene-solr/blob/math_expressions_documentation/solr/solr-ref-guide/src/math-expressions.adoc

The docs are still very rough but you can get an idea of the coverage.



Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Mar 15, 2018 at 10:26 PM, Joel Bernstein  wrote:

> If you want to get everything in query you can do this:
>
> let(echo="d,e",
>  a=search(tx_prod_production, q="oil_first_90_days_production:[1 TO
> *]",
> fq="isParent:true", rows="150",
> fl="id,oil_first_90_days_production,oil_last_30_days_production", sort="id
> asc"),
>  b=col(a, oil_first_90_days_production),
>  c=col(a, oil_last_30_days_production),
>  d=regress(b, c),
>  e=someExpression())
>
> The echo parameter tells the let expression which variables to output.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Mar 15, 2018 at 3:13 PM, Erick Erickson 
> wrote:
>
>> What does the fq clause look like?
>>
>> On Thu, Mar 15, 2018 at 11:51 AM, John Smith 
>> wrote:
>> > Hi Joel, I did some more work on this statistics stuff today. Yes, we do
>> > have nulls in our data; the document contains many fields, we don't
>> always
>> > have values for each field, but we can't set the nulls to 0 either (or
>> any
>> > other value, really) as that will mess up other calculations (such as
>> when
>> > calculating average etc); we would normally just ignore fields with null
>> > values when calculating stats manually ourselves.
>> >
>> > Adding a check in the "q" parameter to ensure that the fields used in
>> the
>> > calculations are > 0 does work now. Thanks for the tip (and sorry,
>> should
>> > have caught that myself). But I am unable to use "fq" for these checks,
>> > they have to be added to the q instead. Adding fq's doesn't have any
>> effect.
>> >
>> >
>> > Anyway, I'm trying to change this up a little. This is what I'm
>> currently
>> > using (switched from "random" to "search" since I actually need the full
>> > hitlist not just a random subset):
>> >
>> > let(a=search(tx_prod_production, q="oil_first_90_days_production:[1 TO
>> *]",
>> > fq="isParent:true", rows="150",
>> > fl="id,oil_first_90_days_production,oil_last_30_days_production",
>> sort="id
>> > asc"),
>> >  b=col(a, oil_first_90_days_production),
>> >  c=col(a, oil_last_30_days_production),
>> >  d=regress(b, c))
>> >
>> > So I have 2 fields there defined, that works great (in terms of a test
>> and
>> > running the query); but I need to replace the second field,
>> > "oil_last_30_days_production" with the avg value in
>> > oil_first_90_days_production.
>> >
>> > I can get the avg with this expression:
>> > stats(tx_prod_production, q="oil_first_90_days_production:[1 TO *]",
>> > fq="isParent:true", rows="150", avg(oil_first_90_days_production))
>> >
>> > But I don't know how to push that avg value into the first streaming
>> > expression; guessing I have to set "c=" but that is where I'm
>> getting
>> > lost, since avg only returns 1 value and the first parameter, "b",
>> returns
>> > a list of sorts. Somehow I have to get the avg value stuffed inside a
>> > "col", where it is the same value for every row in the hitlist...?
>> >
>> > Thanks for your help!
>> >
>> >
>> > On Mon, Mar 5, 2018 at 10:50 PM, Joel Bernstein 
>> wrote:
>> >
>> >> I suspect you've got nulls in your data. I just tested with null
>> values and
>> >> got the same error. For testing purposes try loading the data with
>> default
>> >> values of zero.
>> >>
>> >>
>> >> Joel Bernstein
>> >> http://joelsolr.blogspot.com/
>> >>
>> >> On Mon, Mar 5, 2018 at 10:12 PM, Joel Bernstein 
>> >> wrote:
>> >>
>> >> > Let's break the expression down and build it up slowly. Let's start
>> with:
>> >> >
>> >> > let(echo="true",
>> >> >  a=random(tx_prod_production, q="*:*", fq="isParent:true",
>> rows="15",
>> >> > fl="oil_first_90_days_production,oil_last_30_days_production"),
>> >> >  b=col(a, oil_first_90_days_production))
>> >> >
>> >> >
>> >> > This should return variables a and b. Let's see what the data looks
>> like.
>> >> > I changed the rows from 15 to 15000. If it all looks good we can
>> expand
>> >> the
>> >> > rows and continue adding functions.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Joel Bernstein
>> >> > http://joelsolr.blogspot.com/
>> >> >
>> >> > On Mon, Mar 5, 2018 at 4:11 PM, John Smith 
>> wrote:
>> >> >
>> >> >> Thanks Joel for your help on this.
>> >> >>
>> >> >> What I've done so far:
>> >> >> - unzip downloaded solr-7.2
>> >> >> - modify the _default "managed-schema" to add the random field type
>> and
>> >> >> the dynamic random field
>> >> >> - start solr7 using "solr start -c"
>> >> >> -

Re: statistics in hitlist

2018-03-15 Thread Joel Bernstein

If you want to get everything in query you can do this:

let(echo="d,e",
 a=search(tx_prod_production, q="oil_first_90_days_production:[1 TO *]",
fq="isParent:true", rows="150",
fl="id,oil_first_90_days_production,oil_last_30_days_production", sort="id
asc"),
 b=col(a, oil_first_90_days_production),
 c=col(a, oil_last_30_days_production),
 d=regress(b, c),
 e=someExpression())

The echo parameter tells the let expression which variables to output.



Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Mar 15, 2018 at 3:13 PM, Erick Erickson 
wrote:

> What does the fq clause look like?
>
> On Thu, Mar 15, 2018 at 11:51 AM, John Smith  wrote:
> > Hi Joel, I did some more work on this statistics stuff today. Yes, we do
> > have nulls in our data; the document contains many fields, we don't
> always
> > have values for each field, but we can't set the nulls to 0 either (or
> any
> > other value, really) as that will mess up other calculations (such as
> when
> > calculating average etc); we would normally just ignore fields with null
> > values when calculating stats manually ourselves.
> >
> > Adding a check in the "q" parameter to ensure that the fields used in the
> > calculations are > 0 does work now. Thanks for the tip (and sorry, should
> > have caught that myself). But I am unable to use "fq" for these checks,
> > they have to be added to the q instead. Adding fq's doesn't have any
> effect.
> >
> >
> > Anyway, I'm trying to change this up a little. This is what I'm currently
> > using (switched from "random" to "search" since I actually need the full
> > hitlist not just a random subset):
> >
> > let(a=search(tx_prod_production, q="oil_first_90_days_production:[1 TO
> *]",
> > fq="isParent:true", rows="150",
> > fl="id,oil_first_90_days_production,oil_last_30_days_production",
> sort="id
> > asc"),
> >  b=col(a, oil_first_90_days_production),
> >  c=col(a, oil_last_30_days_production),
> >  d=regress(b, c))
> >
> > So I have 2 fields there defined, that works great (in terms of a test
> and
> > running the query); but I need to replace the second field,
> > "oil_last_30_days_production" with the avg value in
> > oil_first_90_days_production.
> >
> > I can get the avg with this expression:
> > stats(tx_prod_production, q="oil_first_90_days_production:[1 TO *]",
> > fq="isParent:true", rows="150", avg(oil_first_90_days_production))
> >
> > But I don't know how to push that avg value into the first streaming
> > expression; guessing I have to set "c=" but that is where I'm getting
> > lost, since avg only returns 1 value and the first parameter, "b",
> returns
> > a list of sorts. Somehow I have to get the avg value stuffed inside a
> > "col", where it is the same value for every row in the hitlist...?
> >
> > Thanks for your help!
> >
> >
> > On Mon, Mar 5, 2018 at 10:50 PM, Joel Bernstein 
> wrote:
> >
> >> I suspect you've got nulls in your data. I just tested with null values
> and
> >> got the same error. For testing purposes try loading the data with
> default
> >> values of zero.
> >>
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Mon, Mar 5, 2018 at 10:12 PM, Joel Bernstein 
> >> wrote:
> >>
> >> > Let's break the expression down and build it up slowly. Let's start
> with:
> >> >
> >> > let(echo="true",
> >> >  a=random(tx_prod_production, q="*:*", fq="isParent:true",
> rows="15",
> >> > fl="oil_first_90_days_production,oil_last_30_days_production"),
> >> >  b=col(a, oil_first_90_days_production))
> >> >
> >> >
> >> > This should return variables a and b. Let's see what the data looks
> like.
> >> > I changed the rows from 15 to 15000. If it all looks good we can
> expand
> >> the
> >> > rows and continue adding functions.
> >> >
> >> >
> >> >
> >> >
> >> > Joel Bernstein
> >> > http://joelsolr.blogspot.com/
> >> >
> >> > On Mon, Mar 5, 2018 at 4:11 PM, John Smith 
> wrote:
> >> >
> >> >> Thanks Joel for your help on this.
> >> >>
> >> >> What I've done so far:
> >> >> - unzip downloaded solr-7.2
> >> >> - modify the _default "managed-schema" to add the random field type
> and
> >> >> the dynamic random field
> >> >> - start solr7 using "solr start -c"
> >> >> - indexed my data using pint/pdouble/boolean field types etc
> >> >>
> >> >> I can now run the random function all by itself, it returns random
> >> >> results as expected. So far so good!
> >> >>
> >> >> However... now trying to get the regression stuff working:
> >> >>
> >> >> let(a=random(tx_prod_production, q="*:*", fq="isParent:true",
> >> >> rows="15000", fl="oil_first_90_days_producti
> >> >> on,oil_last_30_days_production"),
> >> >> b=col(a, oil_first_90_days_production),
> >> >> c=col(a, oil_last_30_days_production),
> >> >> d=regress(b, c))
> >> >>
> >> >> Posted directly into solr admin UI. Run the streaming expression and
> I
> >> >> get this error message:
> >> >> "EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric
>

Re: statistics in hitlist

2018-03-15 Thread Erick Erickson

What does the fq clause look like?

On Thu, Mar 15, 2018 at 11:51 AM, John Smith  wrote:
> Hi Joel, I did some more work on this statistics stuff today. Yes, we do
> have nulls in our data; the document contains many fields, we don't always
> have values for each field, but we can't set the nulls to 0 either (or any
> other value, really) as that will mess up other calculations (such as when
> calculating average etc); we would normally just ignore fields with null
> values when calculating stats manually ourselves.
>
> Adding a check in the "q" parameter to ensure that the fields used in the
> calculations are > 0 does work now. Thanks for the tip (and sorry, should
> have caught that myself). But I am unable to use "fq" for these checks,
> they have to be added to the q instead. Adding fq's doesn't have any effect.
>
>
> Anyway, I'm trying to change this up a little. This is what I'm currently
> using (switched from "random" to "search" since I actually need the full
> hitlist not just a random subset):
>
> let(a=search(tx_prod_production, q="oil_first_90_days_production:[1 TO *]",
> fq="isParent:true", rows="150",
> fl="id,oil_first_90_days_production,oil_last_30_days_production", sort="id
> asc"),
>  b=col(a, oil_first_90_days_production),
>  c=col(a, oil_last_30_days_production),
>  d=regress(b, c))
>
> So I have 2 fields there defined, that works great (in terms of a test and
> running the query); but I need to replace the second field,
> "oil_last_30_days_production" with the avg value in
> oil_first_90_days_production.
>
> I can get the avg with this expression:
> stats(tx_prod_production, q="oil_first_90_days_production:[1 TO *]",
> fq="isParent:true", rows="150", avg(oil_first_90_days_production))
>
> But I don't know how to push that avg value into the first streaming
> expression; guessing I have to set "c=" but that is where I'm getting
> lost, since avg only returns 1 value and the first parameter, "b", returns
> a list of sorts. Somehow I have to get the avg value stuffed inside a
> "col", where it is the same value for every row in the hitlist...?
>
> Thanks for your help!
>
>
> On Mon, Mar 5, 2018 at 10:50 PM, Joel Bernstein  wrote:
>
>> I suspect you've got nulls in your data. I just tested with null values and
>> got the same error. For testing purposes try loading the data with default
>> values of zero.
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, Mar 5, 2018 at 10:12 PM, Joel Bernstein 
>> wrote:
>>
>> > Let's break the expression down and build it up slowly. Let's start with:
>> >
>> > let(echo="true",
>> >  a=random(tx_prod_production, q="*:*", fq="isParent:true", rows="15",
>> > fl="oil_first_90_days_production,oil_last_30_days_production"),
>> >  b=col(a, oil_first_90_days_production))
>> >
>> >
>> > This should return variables a and b. Let's see what the data looks like.
>> > I changed the rows from 15 to 15000. If it all looks good we can expand
>> the
>> > rows and continue adding functions.
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Mon, Mar 5, 2018 at 4:11 PM, John Smith  wrote:
>> >
>> >> Thanks Joel for your help on this.
>> >>
>> >> What I've done so far:
>> >> - unzip downloaded solr-7.2
>> >> - modify the _default "managed-schema" to add the random field type and
>> >> the dynamic random field
>> >> - start solr7 using "solr start -c"
>> >> - indexed my data using pint/pdouble/boolean field types etc
>> >>
>> >> I can now run the random function all by itself, it returns random
>> >> results as expected. So far so good!
>> >>
>> >> However... now trying to get the regression stuff working:
>> >>
>> >> let(a=random(tx_prod_production, q="*:*", fq="isParent:true",
>> >> rows="15000", fl="oil_first_90_days_producti
>> >> on,oil_last_30_days_production"),
>> >> b=col(a, oil_first_90_days_production),
>> >> c=col(a, oil_last_30_days_production),
>> >> d=regress(b, c))
>> >>
>> >> Posted directly into solr admin UI. Run the streaming expression and I
>> >> get this error message:
>> >> "EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric value
>> >> expected but found type java.lang.String for value
>> >> oil_first_90_days_production"
>> >>
>> >> It thinks my numeric field is defined as a string? But when I view the
>> >> schema, those 2 fields are defined as ints:
>> >>
>> >>
>> >> When I run a normal query and choose xml as output format, then it also
>> >> puts "int" elements into the hitlist, so the schema appears to be
>> correct
>> >> it's just when using this regress function that something goes wrong and
>> >> solr thinks the field is string.
>> >>
>> >> Any suggestions?
>> >> Thanks!
>> >>
>> >>
>> >>
>> >> On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein 
>> >> wrote:
>> >>
>> >>> The field type will also need to be in the schema:
>> >>>
>> >>>  
>> >>>
>> >>> 
>> >>>
>> >>>
>> >>> Joel Bernstein
>> >>> http://joelsolr.blogspot.com/
>> >>>
>

Re: Apache commons fileupload migration

2018-03-15 Thread Christopher Schultz

To whom it may concern,

On 3/15/18 8:40 AM, padmanabhan1616 wrote:
> Hi Team,We are using Apache SOLR-5.2.1 as index engine for our data analytics
> application. As part of this SOLR uses commons-fileupload-1.2.1.jar for file
> manipulation.There is security Vulnerability identified in
> commons-fileupload library: *CVE-2016-131 Apache Commons FileUpload:
> DiskFileItem file manipulation*As per official notice from apache software
> foundations this issue has been addressed in commons-fileupload-1.3.3.jar
> and available for all the dependency vendors.*Is this good toupgrade
> commons-fileupload from 1.2.1 to 1.3.3 version directly?* Please suggest us
> best way to handle this. Note  - *Currently we don't have any requirements
> to upgrade solr, So please suggest best way to handle  this vulnarability
> without upgrade entire SOLR.*Thanks,Padmanabhan

Have you read the changelog?[1]

-chris

[1] https://commons.apache.org/proper/commons-fileupload/changes-report.html



signature.asc
Description: OpenPGP digital signature

Re: statistics in hitlist

2018-03-15 Thread John Smith

Hi Joel, I did some more work on this statistics stuff today. Yes, we do
have nulls in our data; the document contains many fields, we don't always
have values for each field, but we can't set the nulls to 0 either (or any
other value, really) as that will mess up other calculations (such as when
calculating average etc); we would normally just ignore fields with null
values when calculating stats manually ourselves.

Adding a check in the "q" parameter to ensure that the fields used in the
calculations are > 0 does work now. Thanks for the tip (and sorry, should
have caught that myself). But I am unable to use "fq" for these checks,
they have to be added to the q instead. Adding fq's doesn't have any effect.

Anyway, I'm trying to change this up a little. This is what I'm currently
using (switched from "random" to "search" since I actually need the full
hitlist not just a random subset):

let(a=search(tx_prod_production, q="oil_first_90_days_production:[1 TO *]",
fq="isParent:true", rows="150",
fl="id,oil_first_90_days_production,oil_last_30_days_production", sort="id
asc"),
 b=col(a, oil_first_90_days_production),
 c=col(a, oil_last_30_days_production),
 d=regress(b, c))

So I have 2 fields there defined, that works great (in terms of a test and
running the query); but I need to replace the second field,
"oil_last_30_days_production" with the avg value in
oil_first_90_days_production.

I can get the avg with this expression:
stats(tx_prod_production, q="oil_first_90_days_production:[1 TO *]",
fq="isParent:true", rows="150", avg(oil_first_90_days_production))

But I don't know how to push that avg value into the first streaming
expression; guessing I have to set "c=" but that is where I'm getting
lost, since avg only returns 1 value and the first parameter, "b", returns
a list of sorts. Somehow I have to get the avg value stuffed inside a
"col", where it is the same value for every row in the hitlist...?

Thanks for your help!

On Mon, Mar 5, 2018 at 10:50 PM, Joel Bernstein  wrote:

> I suspect you've got nulls in your data. I just tested with null values and
> got the same error. For testing purposes try loading the data with default
> values of zero.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Mar 5, 2018 at 10:12 PM, Joel Bernstein 
> wrote:
>
> > Let's break the expression down and build it up slowly. Let's start with:
> >
> > let(echo="true",
> >  a=random(tx_prod_production, q="*:*", fq="isParent:true", rows="15",
> > fl="oil_first_90_days_production,oil_last_30_days_production"),
> >  b=col(a, oil_first_90_days_production))
> >
> >
> > This should return variables a and b. Let's see what the data looks like.
> > I changed the rows from 15 to 15000. If it all looks good we can expand
> the
> > rows and continue adding functions.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, Mar 5, 2018 at 4:11 PM, John Smith  wrote:
> >
> >> Thanks Joel for your help on this.
> >>
> >> What I've done so far:
> >> - unzip downloaded solr-7.2
> >> - modify the _default "managed-schema" to add the random field type and
> >> the dynamic random field
> >> - start solr7 using "solr start -c"
> >> - indexed my data using pint/pdouble/boolean field types etc
> >>
> >> I can now run the random function all by itself, it returns random
> >> results as expected. So far so good!
> >>
> >> However... now trying to get the regression stuff working:
> >>
> >> let(a=random(tx_prod_production, q="*:*", fq="isParent:true",
> >> rows="15000", fl="oil_first_90_days_producti
> >> on,oil_last_30_days_production"),
> >> b=col(a, oil_first_90_days_production),
> >> c=col(a, oil_last_30_days_production),
> >> d=regress(b, c))
> >>
> >> Posted directly into solr admin UI. Run the streaming expression and I
> >> get this error message:
> >> "EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric value
> >> expected but found type java.lang.String for value
> >> oil_first_90_days_production"
> >>
> >> It thinks my numeric field is defined as a string? But when I view the
> >> schema, those 2 fields are defined as ints:
> >>
> >>
> >> When I run a normal query and choose xml as output format, then it also
> >> puts "int" elements into the hitlist, so the schema appears to be
> correct
> >> it's just when using this regress function that something goes wrong and
> >> solr thinks the field is string.
> >>
> >> Any suggestions?
> >> Thanks!
> >> 
> >>
> >>
> >> On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein 
> >> wrote:
> >>
> >>> The field type will also need to be in the schema:
> >>>
> >>>  
> >>>
> >>> 
> >>>
> >>>
> >>> Joel Bernstein
> >>> http://joelsolr.blogspot.com/
> >>>
> >>> On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein 
> >>> wrote:
> >>>
> >>> > You'll need to have this field in your schema:
> >>> >
> >>> > 
> >>> >
> >>> > I'll check to see if the default schema used with solr start -c has
> >>> this
> >>> > field,

Re: DocValuesField fails if bytes > 32k in solr 7.2.1

2018-03-15 Thread Erick Erickson

No, it wasn't fixed "a long time back" in the sense that you could do
this on any docValues field. Note that JIRA says "(individual codecs
still have their limits, including the default codec)".

Before anyone fixes it, the question is "what is the use-case for
storing such large DocValues fields ?"

Best,
Erick

On Thu, Mar 15, 2018 at 11:22 AM, Minu Theresa Thomas
 wrote:
> Hello Team,
>
> I am using solr 7.2.1. I am getting an exception while indexing saying that
> "DocValuesField  is too large, must be <= 32766, retry?"
>
> This is my field in my managed schema.
>
>  required="false" stored="true"/>
>
>
> When I checked this lucene ticket -
> https://issues.apache.org/jira/browse/LUCENE-4583, it says its fixed long
> time back.
>
> Can someone please let me know how do I get this fixed?
>
> Thanks and Regards,
> Minu

DocValuesField fails if bytes > 32k in solr 7.2.1

2018-03-15 Thread Minu Theresa Thomas

Hello Team,

I am using solr 7.2.1. I am getting an exception while indexing saying that
"DocValuesField  is too large, must be <= 32766, retry?"

This is my field in my managed schema.




When I checked this lucene ticket -
https://issues.apache.org/jira/browse/LUCENE-4583, it says its fixed long
time back.

Can someone please let me know how do I get this fixed?

Thanks and Regards,
Minu

RE: SpellCheck Reload

2018-03-15 Thread Alessandro Benedetti

Hi Sadiki,
the kind of spellchecker you are using built an auxiliary Lucene index as a
support data structure.
That is going to be used to provide the spellcheck suggestions.

"My question is, does "reloading the dictionary" mean completely erasing the
current dictionary and starting from scratch (which is what I want)? "

What you want is re-build the spellchecker.
In the case of the the IndexBasedSpellChecker, the index is used to build
the dictionary.
When the spellchecker is initialized a reader is opened from the latest
index version available.

if in the meantime your index has changed and commits have happened, just
building the spellchecker *should* use the old reader :

@Override
  public void build(SolrCore core, SolrIndexSearcher searcher) throws
IOException {
IndexReader reader = null;
if (sourceLocation == null) {
  // Load from Solr's index
  reader = searcher.getIndexReader();
} else {
  // Load from Lucene index at given sourceLocation
  reader = this.reader;
}

This means your dictionary is not going to see any substantial changes.

So what you need to do is :

1) reload the spellchecker -> which will initialise again the source for the
dictionary to the latest index commit
2) re-build the dictionary



Cheers







-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr Developer needed urgently

2018-03-15 Thread Wenjie Zhang

Response a recruiting email and CC to the entire public
solr-user@lucene.apache.org? Seriously?

On Thu, Mar 15, 2018 at 9:53 AM, John Bickerstaff 
wrote:

> Hi - thanks for thinking of me!
>
> I'm currently lead on the Solr team for Ancestry - and having a good time.
> I might be interested, but moving to New York isn't going to work for me.
> If there is a good chance of working from home, then I might be
> interested...  Let me know...
>
> On Wed, Mar 14, 2018 at 4:14 PM,  wrote:
>
> >
> >
> > Hi,
> >
> > I am Asma Talib from UTG Tech. We are looking for strong Solr Developer
> > with good concepts and skill.
> >
> >
> >
> > Experience: 8 + Years
> >
> > Location: New York
> >
> >
> >
> > This position is for a Senior Search Developer, using Apache Solr for
> > Analytics
> >
> > Following are the Responsibilities of candidate:
> >
> > · Implement automated techniques and processes for the bulk and
> > real time indexing in Apache Solr of large-scale data sets residing in
> > database, Hadoop, flat files and other sources.
> >
> > · Ability design and implement Solr build for generating indexes
> > against structured and semi
> >
> > structured data.
> >
> > · Ability to design, create and manage shards and indexes for
> > Solr Cloud.
> >
> > · Ability to write efficient search queries against Solr indexes
> > using Solr REST/Java API.
> >
> > · Prototype and demonstrate new ideas of feasibility and
> > leveraging for Solr Capabilities.
> >
> > · Ability to understand distributed technologies like Hadoop,
> > Teradata, SQL server and ETLs.
> >
> > · Contribute to all aspects of application development including
> > architecture, design, development and support.
> >
> > · Take ownership of application components and ensure timely
> > delivery of the same
> >
> > · Ability to integrate research and best practices into project
> > solution for continuous improvement.
> >
> > · Troubleshoot Solr indexing process and querying engine.
> >
> > · Identify, clarify, resolve issues and risks, escalating them as
> > needed
> >
> > · Ability to Provide technical guidance and mentoring to other
> > junior team members if needed
> >
> > · Ability to write automated unit test cases for solr search
> > engine.
> >
> >
> >
> >
> >
> >
> > *UTG  Tech- Corp*
> >
> > *Asma Talib*
> >
> > *Talent Acquisition Manager*
> >
> > *Website :www.utgtech.com *
> >
> > *Email :* asmata...@utgtech.com
> >
> > *Phone:* (571) 9325184 <(571)%20932-5184>
> >
> > *LinkedIn: *www.linkedin.com/in/asmatalib/
> >
> >
> >
> >
> >
> >
> >
>

Re: FW: Question about Overseer calling SPLITSHARD collection API command during autoscaling

2018-03-15 Thread Cassandra Targett

Hi Matthew -

It's cool to hear you're using the new autoscaling features.

To answer your first question, SPLITSHARD as an action for autoscaling is
not yet supported. As for when it might be, it's the next big gap to fill
in the autoscaling functionality, but there is some work to do first to
make splitting shards faster and safer overall. So, I hope we'll see it in
7.4, but there's a chance it won't be ready until the release after (7.5,
I'd assume).

AFAICT, there isn't a JIRA issue specifically for the SPLITSHARD support
yet, but there will be one relatively soon. There's an umbrella issue for a
many of the open tasks if you're interested in that:
https://issues.apache.org/jira/browse/SOLR-9735 (although, it's not an
exhaustive roadmap, I don't think).

I think for the time being if you want/need to split a shard, you'd still
need to do it manually.

Hope this helps -
Cassandra

On Thu, Mar 15, 2018 at 11:41 AM, Matthew Faw 
wrote:

> I sent this a few mins ago, but wasn’t yet subscribed.  Forwarding the
> message along to make sure it’s received!
>
> From: Matthew Faw 
> Date: Thursday, March 15, 2018 at 12:28 PM
> To: "solr-user@lucene.apache.org" 
> Cc: Matthew Faw , Alex Meijer <
> alex.mei...@verato.com>
> Subject: Question about Overseer calling SPLITSHARD collection API command
> during autoscaling
>
> Hi,
>
> So I’ve been trying out the new autoscaling features in solr 7.2.1.  I run
> the following commands when creating my solr cluster:
>
>
> Set up overseer role:
> curl -s "solr-service-core:8983/solr/admin/collections?action=
> ADDROLE&role=overseer&node=$thenode"
>
> Create cluster prefs:
> clusterprefs=$(cat <<-EOF
> {
> "set-cluster-preferences" : [
>   {"minimize":"sysLoadAvg"},
>   {"minimize":"cores"}
>   ]
> }
> EOF
> )
> echo "The cluster prefs request body is: $clusterprefs"
> curl -H "Content-Type: application/json" -X POST -d
> "$clusterprefs" solr-service-core:8983/api/cluster/autoscaling
>
> Cluster policy:
> clusterpolicy=$(cat <<-EOF
> {
> "set-cluster-policy": [
>   {"replica": 0, "nodeRole": "overseer"},
>   {"replica": "<2", "shard": "#EACH", "node": "#ANY"},
>   {"cores": ">0", "node": "#ANY"},
>   {"cores": "<5", "node": "#ANY"},
>   {"replica": 0, "sysLoadAvg": ">80"}
>   ]
> }
> EOF
> )
> echo "The cluster policy is $clusterpolicy"
> curl -H "Content-Type: application/json" -X POST -d
> "$clusterpolicy" solr-service-core:8983/api/cluster/autoscaling
>
> nodeaddtrigger=$(cat <<-EOF
> {
>  "set-trigger": {
>   "name" : "node_added_trigger",
>   "event" : "nodeAdded",
>   "waitFor" : "1s"
>  }
> }
> EOF
> )
> echo "The node added trigger request: $nodeaddtrigger"
> curl -H "Content-Type: application/json" -X POST -d
> "$nodeaddtrigger" solr-service-core:8983/api/cluster/autoscaling
>
>
> I then create a collection with 2 shards and 3 replicas, under a set of
> nodes in an autoscaling group (initially 4, scales up to 10):
> curl -s "solr-service-core:8983/solr/admin/collections?action=
> CREATE&name=${COLLECTION_NAME}&numShards=${NUM_SHARDS}&
> replicationFactor=${NUM_REPLICAS}&autoAddReplicas=${
> AUTO_ADD_REPLICAS}&collection.configName=${COLLECTION_NAME}&
> waitForFinalState=true"
>
>
> I’ve observed several autoscaling actions being performed – automatically
> re-adding replicas, and moving shards to nodes based on my cluster
> policy/prefs.  However, I have not observed a SPLITSHARD operation.  My
> question is:
> 1) should I expect the Overseer to be able to call the SPLITSHARD command,
> or is this feature not yet implemented?
> 2) If it is possible, do you have any recommendations as to how I might
> force this type of behavior to happen?
> 3) If it’s not implemented yet, when could I expect the feature to be
> available?
>
> If you need any more details, please let me know! Really excited about
> these new features.
>
> Thanks,
> Matthew
>
> The content of this email is intended solely for the individual or entity
> named above and access by anyone else is unauthorized. If you are not the
> intended recipient, any disclosure, copying, distribution, or use of the
> contents of this information is prohibited and may be unlawful. If you have
> received this electronic transmission in error, please reply immediately to
> the sender that you have received the message in error, and delete it.
> Thank you.
>

Re: Solr Developer needed urgently

2018-03-15 Thread John Bickerstaff

Hi - thanks for thinking of me!

I'm currently lead on the Solr team for Ancestry - and having a good time.
I might be interested, but moving to New York isn't going to work for me.
If there is a good chance of working from home, then I might be
interested...  Let me know...

On Wed, Mar 14, 2018 at 4:14 PM,  wrote:

>
>
> Hi,
>
> I am Asma Talib from UTG Tech. We are looking for strong Solr Developer
> with good concepts and skill.
>
>
>
> Experience: 8 + Years
>
> Location: New York
>
>
>
> This position is for a Senior Search Developer, using Apache Solr for
> Analytics
>
> Following are the Responsibilities of candidate:
>
> · Implement automated techniques and processes for the bulk and
> real time indexing in Apache Solr of large-scale data sets residing in
> database, Hadoop, flat files and other sources.
>
> · Ability design and implement Solr build for generating indexes
> against structured and semi
>
> structured data.
>
> · Ability to design, create and manage shards and indexes for
> Solr Cloud.
>
> · Ability to write efficient search queries against Solr indexes
> using Solr REST/Java API.
>
> · Prototype and demonstrate new ideas of feasibility and
> leveraging for Solr Capabilities.
>
> · Ability to understand distributed technologies like Hadoop,
> Teradata, SQL server and ETLs.
>
> · Contribute to all aspects of application development including
> architecture, design, development and support.
>
> · Take ownership of application components and ensure timely
> delivery of the same
>
> · Ability to integrate research and best practices into project
> solution for continuous improvement.
>
> · Troubleshoot Solr indexing process and querying engine.
>
> · Identify, clarify, resolve issues and risks, escalating them as
> needed
>
> · Ability to Provide technical guidance and mentoring to other
> junior team members if needed
>
> · Ability to write automated unit test cases for solr search
> engine.
>
>
>
>
>
>
> *UTG  Tech- Corp*
>
> *Asma Talib*
>
> *Talent Acquisition Manager*
>
> *Website :www.utgtech.com *
>
> *Email :* asmata...@utgtech.com
>
> *Phone:* (571) 9325184 <(571)%20932-5184>
>
> *LinkedIn: *www.linkedin.com/in/asmatalib/
>
>
>
>
>
>
>

Question about Overseer calling SPLITSHARD collection API command during autoscaling

2018-03-15 Thread Matthew Faw

Hi,

So I’ve been trying out the new autoscaling features in solr 7.2.1.  I run the 
following commands when creating my solr cluster:

Set up overseer role:
curl -s 
"solr-service-core:8983/solr/admin/collections?action=ADDROLE&role=overseer&node=$thenode"

Create cluster prefs:
clusterprefs=$(cat <<-EOF
{
"set-cluster-preferences" : [
  {"minimize":"sysLoadAvg"},
  {"minimize":"cores"}
  ]
}
EOF
)
echo "The cluster prefs request body is: $clusterprefs"
curl -H "Content-Type: application/json" -X POST -d "$clusterprefs" 
solr-service-core:8983/api/cluster/autoscaling

Cluster policy:
clusterpolicy=$(cat <<-EOF
{
"set-cluster-policy": [
  {"replica": 0, "nodeRole": "overseer"},
  {"replica": "<2", "shard": "#EACH", "node": "#ANY"},
  {"cores": ">0", "node": "#ANY"},
  {"cores": "<5", "node": "#ANY"},
  {"replica": 0, "sysLoadAvg": ">80"}
  ]
}
EOF
)
echo "The cluster policy is $clusterpolicy"
curl -H "Content-Type: application/json" -X POST -d 
"$clusterpolicy" solr-service-core:8983/api/cluster/autoscaling

nodeaddtrigger=$(cat <<-EOF
{
 "set-trigger": {
  "name" : "node_added_trigger",
  "event" : "nodeAdded",
  "waitFor" : "1s"
 }
}
EOF
)
echo "The node added trigger request: $nodeaddtrigger"
curl -H "Content-Type: application/json" -X POST -d 
"$nodeaddtrigger" solr-service-core:8983/api/cluster/autoscaling


I then create a collection with 2 shards and 3 replicas, under a set of nodes 
in an autoscaling group (initially 4, scales up to 10):
curl -s 
"solr-service-core:8983/solr/admin/collections?action=CREATE&name=${COLLECTION_NAME}&numShards=${NUM_SHARDS}&replicationFactor=${NUM_REPLICAS}&autoAddReplicas=${AUTO_ADD_REPLICAS}&collection.configName=${COLLECTION_NAME}&waitForFinalState=true"


I’ve observed several autoscaling actions being performed – automatically 
re-adding replicas, and moving shards to nodes based on my cluster 
policy/prefs.  However, I have not observed a SPLITSHARD operation.  My 
question is:
1) should I expect the Overseer to be able to call the SPLITSHARD command, or 
is this feature not yet implemented?
2) If it is possible, do you have any recommendations as to how I might force 
this type of behavior to happen?
3) If it’s not implemented yet, when could I expect the feature to be available?

If you need any more details, please let me know! Really excited about these 
new features.

Thanks,
Matthew

The content of this email is intended solely for the individual or entity named 
above and access by anyone else is unauthorized. If you are not the intended 
recipient, any disclosure, copying, distribution, or use of the contents of 
this information is prohibited and may be unlawful. If you have received this 
electronic transmission in error, please reply immediately to the sender that 
you have received the message in error, and delete it. Thank you.

FW: Question about Overseer calling SPLITSHARD collection API command during autoscaling

2018-03-15 Thread Matthew Faw

I sent this a few mins ago, but wasn’t yet subscribed.  Forwarding the message 
along to make sure it’s received!

From: Matthew Faw 
Date: Thursday, March 15, 2018 at 12:28 PM
To: "solr-user@lucene.apache.org" 
Cc: Matthew Faw , Alex Meijer 
Subject: Question about Overseer calling SPLITSHARD collection API command 
during autoscaling

Hi,

So I’ve been trying out the new autoscaling features in solr 7.2.1.  I run the 
following commands when creating my solr cluster:

Set up overseer role:
curl -s 
"solr-service-core:8983/solr/admin/collections?action=ADDROLE&role=overseer&node=$thenode"

Create cluster prefs:
clusterprefs=$(cat <<-EOF
{
"set-cluster-preferences" : [
  {"minimize":"sysLoadAvg"},
  {"minimize":"cores"}
  ]
}
EOF
)
echo "The cluster prefs request body is: $clusterprefs"
curl -H "Content-Type: application/json" -X POST -d "$clusterprefs" 
solr-service-core:8983/api/cluster/autoscaling

Cluster policy:
clusterpolicy=$(cat <<-EOF
{
"set-cluster-policy": [
  {"replica": 0, "nodeRole": "overseer"},
  {"replica": "<2", "shard": "#EACH", "node": "#ANY"},
  {"cores": ">0", "node": "#ANY"},
  {"cores": "<5", "node": "#ANY"},
  {"replica": 0, "sysLoadAvg": ">80"}
  ]
}
EOF
)
echo "The cluster policy is $clusterpolicy"
curl -H "Content-Type: application/json" -X POST -d 
"$clusterpolicy" solr-service-core:8983/api/cluster/autoscaling

nodeaddtrigger=$(cat <<-EOF
{
 "set-trigger": {
  "name" : "node_added_trigger",
  "event" : "nodeAdded",
  "waitFor" : "1s"
 }
}
EOF
)
echo "The node added trigger request: $nodeaddtrigger"
curl -H "Content-Type: application/json" -X POST -d 
"$nodeaddtrigger" solr-service-core:8983/api/cluster/autoscaling

I then create a collection with 2 shards and 3 replicas, under a set of nodes 
in an autoscaling group (initially 4, scales up to 10):
curl -s 
"solr-service-core:8983/solr/admin/collections?action=CREATE&name=${COLLECTION_NAME}&numShards=${NUM_SHARDS}&replicationFactor=${NUM_REPLICAS}&autoAddReplicas=${AUTO_ADD_REPLICAS}&collection.configName=${COLLECTION_NAME}&waitForFinalState=true"

I’ve observed several autoscaling actions being performed – automatically 
re-adding replicas, and moving shards to nodes based on my cluster 
policy/prefs.  However, I have not observed a SPLITSHARD operation.  My 
question is:
1) should I expect the Overseer to be able to call the SPLITSHARD command, or 
is this feature not yet implemented?
2) If it is possible, do you have any recommendations as to how I might force 
this type of behavior to happen?
3) If it’s not implemented yet, when could I expect the feature to be available?

If you need any more details, please let me know! Really excited about these 
new features.

Thanks,
Matthew

The content of this email is intended solely for the individual or entity named 
above and access by anyone else is unauthorized. If you are not the intended 
recipient, any disclosure, copying, distribution, or use of the contents of 
this information is prohibited and may be unlawful. If you have received this 
electronic transmission in error, please reply immediately to the sender that 
you have received the message in error, and delete it. Thank you.

Re: Some performance questions....

2018-03-15 Thread Alessandro Benedetti

*Single Solr Instance VS Multiple Solr instances on Single Server
*

I think there is no benefit in having multiple Solr instances on a single
server, unless the heap memory required by the JVM is too big.
And remember that this has relatively to do with the index size ( inverted
index is memory mapped OFF heap and docValues as well).
On the other hand of course Apache Solr uses plenty of JVM heap memory as
well ( caches, temporary data structures during indexing, ect ect)

> Deepak: 
> 
> Well its kinda a given that when running ANYTHING under a VM you have an 
> overhead..

***Deepak*** 
You mean you are assuming without any facts (performance benchmark with n 
without VM) 
 ***Deepak*** 
I think Shawn detailed this quite extensively, I am no sys admin or OS
expert, but there is no need of benchmarks and I don't even understand your
doubts.
In Information technology anytime you add additional layers of software you
need adapters which means additional instructions executed.
It is obvious  that having :
metal -> OS -> APP is cheaper instruction wise then 
metal -> OS -> VM -> APP
The APP will execute instruction in the VM that will be responsible to
translate those instructions for the underlining OS.
Going direct you skip one passage.
you can think about this when you emulate different OS, is it cheaper to run
windows on a machine directly to execute windows applications or run a
Windows VM on top of another OS to execute windows applications ?



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: [nesting] Any way to return the whole hierarchical structure when doing Block Join queries?

2018-03-15 Thread Jan Høydahl

> 14. mar. 2018 kl. 15:45 skrev Anshum Gupta :
> 
> Hi Jan,
> 
> The way I remember it was done (or at least we did it) is by storing the
> depth information as a field in the document using an update request
> processor and using a custom transformer to reconstruct the original
> multi-level document from it.

Right, you either have to store pointer to parent or a depth info.
If you only have depth (1, 2, 3) it will not be possible to reconstruct
a complex document with multiple child docs each having grandchildren?

I found this code in AddUpdateCommand.iterator() 
:
return new Iterator() {
  Iterator iter;

  {
List all = flatten(solrDoc);

String idField = getHashableId();

boolean isVersion = version != 0;

for (SolrInputDocument sdoc : all) {
  sdoc.setField(IndexSchema.ROOT_FIELD_NAME, idField);
  if(isVersion) sdoc.setField(CommonParams.VERSION_FIELD, version);
  // TODO: if possible concurrent modification exception (if 
SolrInputDocument not cloned and is being forwarded to replicas)
  // then we could add this field to the generated lucene document instead.
}

iter = all.iterator();
 }

It recursively finds all child docs and turns it into an iterator, adding a 
pointer to root in all children. Guess it would be possible to rewrite this 
part to add a _parent_ field as well?
That would lay the foundation for rewriting [child] transformer to reconstruct 
grandchildren?


Jan

Re: Problem encountered upon starting Solr after improper exit

2018-03-15 Thread Erick Erickson

I've never heard fo killing a Java doing this. These
lines:

dyld: Library not loaded: /usr/local/opt/mpfr/lib/libmpfr.4.dylib
Referenced from: /usr/local/bin/awk

indicate what I expect is the root of your problem, _somehow_
files were deleted. I'd be _very_ surprised if killing the Java
process had anything to do with it, probably files were deleted
and the first time you noticed was when the start script
tried to use awk and failed to load that file.

My bet is that you can't use awk at all anywhere, even
withougy trying to invoke Solr.

Best,
Erick

On Wed, Mar 14, 2018 at 9:47 PM, YIFAN LI  wrote:
> To whom it may concern,
>
> I am running Solr 7.1.0 and encountered a problem starting Solr after I
> killed the Java process running Solr without proper cleanup. The error
> message that I received is as following:
>
> solr-7.1.0 liyifan$ bin/solr run
>
>
> dyld: Library not loaded: /usr/local/opt/mpfr/lib/libmpfr.4.dylib
>
>   Referenced from: /usr/local/bin/awk
>
>   Reason: image not found
>
> Your current version of Java is too old to run this version of Solr
>
> We found version , using command 'java -version', with response:
>
> java version "1.8.0_45"
>
> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
>
> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>
>
> Please install latest version of Java 1.8 or set JAVA_HOME properly.
>
>
> Debug information:
>
> JAVA_HOME: N/A
>
> Active Path:
>
> /Users/liyifan/anaconda3/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/opt/local/bin:/opt/local/sbin:/usr/Documents/2016\
> Spring/appcivist-mobilization/activator-dist-1.3.9/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/usr/local/git/bin
>
> After I reset the JAVA_HOME variable, it still gives me the error:
>
> bin/solr start
>
> dyld: Library not loaded: /usr/local/opt/mpfr/lib/libmpfr.4.dylib
>
>   Referenced from: /usr/local/bin/awk
>
>   Reason: image not found
>
> Your current version of Java is too old to run this version of Solr
>
> We found version , using command
> '/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin/java
> -version', with response:
>
> java version "1.8.0_45"
>
> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
>
> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>
>
> Please install latest version of Java 1.8 or set JAVA_HOME properly.
>
>
> Debug information:
>
> JAVA_HOME: /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home
>
> Active Path:
>
> /Users/liyifan/anaconda3/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/opt/local/bin:/opt/local/sbin:/usr/Documents/2016\
> Spring/appcivist-mobilization/activator-dist-1.3.9/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/usr/local/git/bin:/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin
>
> and the director /usr/local/opt/mpfr/lib/ only contains the following files:
>
> ls /usr/local/opt/mpfr/lib/
>
> libmpfr.6.dylib libmpfr.a libmpfr.dylib pkgconfig
>
> Do you think this problem is caused by killing the Java process without
> proper cleanup? Could you suggest some solution to this problem? Thank you
> very much!
>
> Best,
> Yifan

Re: LTR - OriginalScore query issue

2018-03-15 Thread Alessandro Benedetti

>From the snippet you posted this is the query you run :
q=id:"13245336"

So the original score ( for each document in the result set) can only be the
score associated to that query.

You then pass an EFI with a different text.
You can now use that information to calculate another feature if you want.
You can define a SolrFeature :

{
"store" : "myFeatureStore",
"name" : "userTextCat",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "{! http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Some performance questions....

2018-03-15 Thread Deepak Goel

Please see inline...



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Mar 15, 2018 at 6:04 PM, BlackIce  wrote:

> Shawn:
> well the idea was to utilize system resources more efficiently.. this is
> not due so much to Solr, as I sayd I don't know that much about Solr,
> except Shema.xml and Solarconfig.xml - However the main app that will be
> running is more or less a single threated app which takes advantage when
> run under several instances, ie: parallelism, so I thought, since I'm at it
> I may give solr a few instances as well... but the more I read, the more
> confused I get.. I've read about some guy running 8 Solr instances on his
> dual Xeon 26xx series, each VM with 12 GB ram..
>
> Deepak:
>
> Well its kinda a given that when running ANYTHING under a VM you have an
> overhead..

***Deepak***
You mean you are assuming without any facts (performance benchmark with n
without VM)
 ***Deepak***

> so since I control the hardware, ie: not sharing space on some
> hosted VM by some ISP... why not skip the whole VM thing entirely?
>
> Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
> actually is more efficient with a very small Heap and to have everything
> mapped to virtual memory... Which brings me to the next question.. is the
> Virtual memory mapping done by the OS or Solar? Does the Virtual memory
> reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
> mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
> a SSD?
>
> ***Deepak***
The OS does mapping itself to virtual memory (Atleast Unix does). However
am not sure of the internal mechanism of Solr
***Deepak***


> For now.. my FEELING is to run one Solr instance on this particular
> machine.. by the time the RAM is outgrown add another machine and so
> forth...

***Deepak***
I wonder if there are any performance benchmarks showing how Solr scales at
higher loads on a single machine (is it linear or non linear). Most
software don't scale linearly at higher loads
 ***Deepak***

> I've had a small set-back: due to the chasis configuration I could
> only fit in Half of the HDD's I intented.. the rest collide with the CPU
> heatsinks (Don't ask)
>  so my entire initial set-up has changed and with it my initial "growth
> strategy"
>
> On Wed, Mar 14, 2018 at 4:15 PM, Shawn Heisey  wrote:
>
> > On 3/14/2018 5:49 AM, BlackIce wrote:
> >
> >> I was just thinking Do I really need separate VM's in order to run
> >> multiple Solr instances? Doesn't it suffice to have each instance in its
> >> own user account?
> >>
> >
> > You can run multiple instances all under the same account on one machine.
> > But for a single machine, why do you need multiple Solr instances at all?
> > One instance can handle many indexes, and will probably do it more
> > efficiently than multiple instances.
> >
> > The only time I would *ever* recommend multiple Solr instances is when a
> > single instance would need an ENORMOUS Java heap -- something much larger
> > than 32GB.  If something like that can be split into multiple instances
> > where each one has a heap that's 31GB heap or less, then memory usage
> will
> > be more efficient and Java's garbage collection will work better.
> >
> > FYI -- Running Java with a 32GB heap actually has LESS memory available
> > than running it with a 31GB heap.  This is because when the heap reaches
> > 32GB, Java must switch to 64-bit pointers, so every little allocation
> > requires a little bit more memory.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Copying a SolrCloud collection to other hosts

2018-03-15 Thread Erick Erickson

yeah, it's on a core-by-core basis. Which also makes getting it
propagated to all replicas something you have to be sure happens...

Glad it's working for you!
Erick

On Thu, Mar 15, 2018 at 1:54 AM, Patrick Schemitz  wrote:
> Hi Erick,
>
> thanks a lot, that solved our problem nicely.
>
> (It took us a try or two to notice that this will not copy the entire
> collection but only the shard on the source instance, and we need to do
> this for all instances explicitly. But hey, we had to do the same for
> the old approch of scp'ing the data directories.)
>
> Ciao, Patrick
>
> On Tue, Mar 06, 2018 at 07:18:15AM -0800, Erick Erickson wrote:
>> this is part of the "different replica types" capability, there are
>> NRT (the only type available prior to 7x), PULL and TLOG which would
>> have different names. I don't know of any way to switch it off.
>>
>> As far as moving the data, here's a little known trick: Use the
>> replication API to issue a fetchindexk, see:
>> https://lucene.apache.org/solr/guide/6_6/index-replication.html As
>> long as the target cluster can "see" the source cluster via http, this
>> should work. This is entirely outside SolrCloud and ZooKeeper is not
>> involved. This would even work with, say, one side being stand-alone
>> and the other being SolrCloud (not that you want to do that, just
>> illustrating it's not part of SolrCloud)...
>>
>> So you'd specify something like:
>> http://target_node:port/solr/core_name/replication?command=fetchindex&masterUrl=http://source_node:port/solr/core_name
>>
>> "core_name" in these cases is what appears in the "cores" dropdown on
>> the admin UI page. You do not have to shut Solr down at all on either
>> end to use this, although last I knew the target node would not serve
>> queries while this was happening.
>>
>> An alternative is to not hard-code the names in your copy script,
>> rather look at the information in ZooKeeper for your source and target
>> information, you could do this by using the CLUSTERSTATUS collections
>> API call.
>>
>> Best,
>> Erick
>>
>> On Tue, Mar 6, 2018 at 6:47 AM, Patrick Schemitz  wrote:
>> > Hi List,
>> >
>> > so I'm running a bunch of SolrCloud clusters (each cluster is: 8 shards
>> > on 2 servers, with 4 instances per server, no replicas, i.e. 1 shard per
>> > instance).
>> >
>> > Building the index afresh takes 15+ hours, so when I have to deploy a new
>> > index, I build it once, on one cluster, and then copy (scp) over the
>> > data//index directories (shutting down the Solr instances 
>> > first).
>> >
>> > I could get Solr 6.5.1 to number the shard/replica directories nicely via
>> > the createNodeSet and createNodeSet.shuffle options:
>> >
>> > Solr 6.5.1 /var/lib/solr:
>> >
>> > Server node 1:
>> > instance00/data/main_index_shard1_replica1
>> > instance01/data/main_index_shard2_replica1
>> > instance02/data/main_index_shard3_replica1
>> > instance03/data/main_index_shard4_replica1
>> >
>> > Server node 2:
>> > instance00/data/main_index_shard5_replica1
>> > instance01/data/main_index_shard6_replica1
>> > instance02/data/main_index_shard7_replica1
>> > instance03/data/main_index_shard8_replica1
>> >
>> > However, while attempting to upgrade to 7.2.1, this numbering has changed:
>> >
>> > Solr 7.2.1 /var/lib/solr:
>> >
>> > Server node 1:
>> > instance00/data/main_index_shard1_replica_n1
>> > instance01/data/main_index_shard2_replica_n2
>> > instance02/data/main_index_shard3_replica_n4
>> > instance03/data/main_index_shard4_replica_n6
>> >
>> > Server node 2:
>> > instance00/data/main_index_shard5_replica_n8
>> > instance01/data/main_index_shard6_replica_n10
>> > instance02/data/main_index_shard7_replica_n12
>> > instance03/data/main_index_shard8_replica_n14
>> >
>> > This new numbering breaks my copy script, and furthermode, I'm worried
>> > as to what happens when the numbering is different among target clusters.
>> >
>> > How can I switch this back to the old numbering scheme?
>> >
>> > Side note: is there a recommended way of doing this? Is the
>> > backup/restore mechanism suitable for this? The ref guide is kind of terse
>> > here.
>> >
>> > Thanks in advance,
>> >
>> > Ciao, Patrick

Re: Remove Replacement character "�" from the search results

2018-03-15 Thread Erick Erickson

This is more likely a problem with your browser's character set, try
setting it to UTF-8.
Best,
Erick

On Thu, Mar 15, 2018 at 5:59 AM, uttam Dhakal  wrote:
> Hello,
>
> I want to remove certain characters from the search result. Image of what I
> now get (and want to avoid) is attached in this email.
>
> My impression is I need to write an "updaterequestprocessor", is there any
> built-in class specific for this need? Closest class I found which matches
> the description is "IgnoreFieldUpdateProcessorFactory" but I am not sure
> about this.
>
> Any help on how to do this will very helpful as I am very new to Solr.
>
> Thank you very much
>
> Sincerely,
> Uttam Dhakal
>

Re: solr query

2018-03-15 Thread Walter Underwood

We have an index with thousands of fields. Only a few are accessed on each 
query.

These fields break out three different kinds of weights for a thousand or so 
different school subjects. Each query is just for one subject, so the scoring 
uses those three fields. Like:

* weight_a_1234
* weight_b_1234
* weight_c_1234

Another query would use:

* weight_a_4567
* weight_b_4567
* weight_c_4567

And so on… Lots of fields, but queries run in 15 milliseconds even with updates 
every few minutes.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 15, 2018, at 2:51 AM, Stefan Matheis  wrote:
> 
>> Is this practical adding so much additional fields?
> 
> Well, as always "it depends" .. the way I see it: what are 20 fields? They
> just sit around and make your life way easier.
> 
> You have two choices: stay with one field and have a hard time ... or add
> another 20 or so which do exactly what you want.
> 
> To complete this with another use case: full text search including phonetic
> matches. Instead of just having _one_ field for "name" and a awful list of
> analyzers on it ... I'd probably go with a bunch of them: Separate first-
> and last name, another two (one for each part of the name) for lowercase
> variants, another two for phonetic variants .. which already makes six. You
> get the possibility to do boosting on those aspects more or less for free
> ..
> 
> Almost always it's write once, read often .. so do the heavy lifting while
> indexing and enjoy simple queries.
> 
> - Stefan
> 
> On Mar 15, 2018 8:12 AM, "Albert Lee"  wrote:
> 
>> Cause I got about 20 date fields or more. If add a separate field for it,
>> then I have to add additional 3 field for each of them.  For example, for
>> the field birthdate, I need to add birthdate_year, birthdate_month,
>> birthdate_day.
>> Is this practical adding so much additional fields?
>> 
>> Albert
>> From: Stefan Matheis
>> Sent: Thursday, March 15, 2018 3:05 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: solr query
>> 
>>> You have any other idea?
>> 
>> Yes, we go back to start and discuss again why you're not adding a separate
>> field for that. It's the simplest thing possible and avoids all those
>> workarounds that got mentioned.
>> 
>> -Stefan
>> 
>> On Mar 15, 2018 4:08 AM, "Albert Lee"  wrote:
>> 
>>> Hi Emir,
>>> 
>>> If using OR-ed conditions for different years then the query will be very
>>> long if I got 100 years and I think this is not practical.
>>> You have any other idea?
>>> 
>>> Regards,
>>> Albert
>>> From: Gus Heck
>>> Sent: Thursday, March 15, 2018 12:43 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: solr query
>>> 
>>> I think you have inadvertently "corrected" the intentional exclusive end
>> on
>>> my range... [NOW/MONTH TO NOW/MONTH+1MONTH}
>>> 
>>> On Wed, Mar 14, 2018 at 12:08 PM, Emir Arnautović <
>>> emir.arnauto...@sematext.com> wrote:
>>> 
 Hi Gus,
 It is just current month, but Albert is interested in month, regardless
>>> of
 year. It can be done with OR-ed conditions for different years:
 birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH] OR birthDate:[NOW-1YEAR/MONTH
>>> TO
 NOW-1YEAR/MONTH+1MONTH] OR birthDate:[NOW-2YEAR/MONTH TO
 NOW-2YEAR/MONTH+1MONTH] OR…
 
 Emir
 --
 Monitoring - Log Management - Alerting - Anomaly Detection
 Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
 
 
 
> On 14 Mar 2018, at 16:55, Gus Heck  wrote:
> 
> I think you can specify the current month with
> 
> birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH}
> 
> does that work for you?
> 
> On Wed, Mar 14, 2018 at 6:32 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
>> Actually you don’t have to add another field - there is function ms
>>> that
>> converts date to timestamp. What you can do is use frange query
>> parser
 and
>> play bit with math, e.g. sub(ms(date_field),ms(NOW/YEAR)) will give
>>> you
>> ms elapsed since this year and you know that from 0 to 31*8640
>> is
>> January, from 31*8640+1 to … is February and so on.
>> 
>> If you go this path, I would suggest custom function that will
>> convert
>> date to month/year.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training -
>>> http://sematext.com/
>> 
>> 
>> 
>>> On 14 Mar 2018, at 10:53, Albert Lee 
>>> wrote:
>>> 
>>> I don’t want to add separate fields since I have many dates to
>> index.
>> How to index it as timestamp and do function query, any example or
>> documentation?
>>> 
>>> Regards,
>>> Albert
>>> 
>>> From: Emir Arnautović
>>> Sent: Wednesday, March 14, 2018 5:38 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: solr query
>>> 
>>> Hi Albert,
>>> The simp

SOLR subscription

2018-03-15 Thread SAMMAR UL HASSAN

Hi,

I hope all is well. We are using SOLR for searches in our products. We want to 
share some sort of feedback & also discuss various issues. As per your website, 
we need to subscribe on 
solr-user@lucene.apache.org to discuss the 
different queries so please register my email address to join the Solr 
community. In case of any additional information is required please let me know.

I thank you in advance for your timely support.
Regards
Syed Sammar ul Hassan
Sr. Software Engineer (IT)
MTBC | A Unique Healthcare IT Company(r)
7 Clyde Road | Somerset, NJ 08873
P:  732-873-5133 x225 | F:  732-873-3378
www.mtbc.com | 
sammarulhas...@mtbc.com
ONC-ACB Certified EHR 2014 Edition | Deloitte(r) Technology Fast 500 | 
SureScripts(r) Solution Provider | Microsoft(r) Certified Partner | Inc. 
500|5000(r)
NOTICE: The information contained in this e-mail message is confidential and 
intended only for the personal and confidential use of the designated 
recipient(s) named above. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you have received this document in error, and any review, distribution, or 
copying of this message is strictly prohibited.  If you have received this 
communication in error, please notify us immediately by email or telephone and 
delete the original message in its entirety.  MTBC, the stylized MTBC logo, A 
Unique Healthcare IT Company and other MTBC logos, product and service names 
are trademarks of MTBC.

Re: Expose a metric for percentage-recovered during full recoveries

2018-03-15 Thread Andrzej Białecki

Hi S G,

This looks useful, and it should be easy to add to the existing metrics in 
ReplicationHandler, probably somewhere around ReplicationHandler:856 .

> On 14 Mar 2018, at 20:16, S G  wrote:
> 
> Hi,
> 
> Solr does full recoveries very frequently - sometimes even for seemingly
> simple cases like adding a field to the schema, a couple of nodes go into
> recovery.
> It would be nice if it did not do such full recoveries so frequently but
> since that may require a lot of fixing, can we have a metric that reports
> how much a core has recovered already?
> 
> Example:
> 
> $ cd data
> $ du -h . | grep  my_collection | grep -w index
> 77G   ./my_collection_shard3_replica2/data/index.20180314184942993
> 145G ./my_collection_shard3_replica2/data/index.20180112001943687
> 
> This shows that the shard3-replica2 core is doing a full recovery and has
> only copied 77G out of 145G
> That is about 50% recovery done.
> 
> 
> It would be very nice if we can have this as a JMX metric and we can then
> plot it somewhere instead of having to keep running the same command in a
> loop and guessing how much is left to be copied.
> 
> A metric like the following would be great:
> {
>"my_collection_shard3_replica2": {
> "recovery": {
>  "currentSize": "77 gb",
>  "expectedSize": "145 gb",
>  "percentRecovered": "50",
>  "startTimeEpoch": "361273126317"
>  }
>}
> }
> 
> If it looks useful, I will open a JIRA for the same.
> 
> Thanks
> SG

Remove Replacement character "�" from the search results

2018-03-15 Thread uttam Dhakal

Hello,
I want to remove certain characters from the search result. Image of what I now 
get (and want to avoid) is attached in this email.
My impression is I need to write an "updaterequestprocessor", is there any 
built-in class specific for this need? Closest class I found which matches the 
description is "IgnoreFieldUpdateProcessorFactory" but I am not sure about this.
Any help on how to do this will very helpful as I am very new to Solr.
Thank you very much
Sincerely,Uttam Dhakal

Re: Matching Queries with Wildcards and Numbers

2018-03-15 Thread tapan1707

I think it should have worked. Could you share the results for both queries
with &debug=true? 
Also, what's the result for ec1?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Apache commons fileupload migration

2018-03-15 Thread padmanabhan1616

Hi Team,We are using Apache SOLR-5.2.1 as index engine for our data analytics
application. As part of this SOLR uses commons-fileupload-1.2.1.jar for file
manipulation.There is security Vulnerability identified in
commons-fileupload library: *CVE-2016-131 Apache Commons FileUpload:
DiskFileItem file manipulation*As per official notice from apache software
foundations this issue has been addressed in commons-fileupload-1.3.3.jar
and available for all the dependency vendors.*Is this good toupgrade
commons-fileupload from 1.2.1 to 1.3.3 version directly?* Please suggest us
best way to handle this. Note  - *Currently we don't have any requirements
to upgrade solr, So please suggest best way to handle  this vulnarability
without upgrade entire SOLR.*Thanks,Padmanabhan



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Execution of query in subentity dependent on a field from the main entity

2018-03-15 Thread PeterKerk

How can I make the execution of a query in a subentity dependent on a field
value from the main entity?

So as you can see in the (simplified) data config below, there's an entity
`categories_lvl_0` which holds an expensive query. I ONLY want to execute
this query if: searchobject.objecttype=115

How can I configure this?





 







   










--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Some performance questions....

2018-03-15 Thread BlackIce

Shawn:
well the idea was to utilize system resources more efficiently.. this is
not due so much to Solr, as I sayd I don't know that much about Solr,
except Shema.xml and Solarconfig.xml - However the main app that will be
running is more or less a single threated app which takes advantage when
run under several instances, ie: parallelism, so I thought, since I'm at it
I may give solr a few instances as well... but the more I read, the more
confused I get.. I've read about some guy running 8 Solr instances on his
dual Xeon 26xx series, each VM with 12 GB ram..

Deepak:

Well its kinda a given that when running ANYTHING under a VM you have an
overhead.. so since I control the hardware, ie: not sharing space on some
hosted VM by some ISP... why not skip the whole VM thing entirely?

Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
actually is more efficient with a very small Heap and to have everything
mapped to virtual memory... Which brings me to the next question.. is the
Virtual memory mapping done by the OS or Solar? Does the Virtual memory
reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
a SSD?

For now.. my FEELING is to run one Solr instance on this particular
machine.. by the time the RAM is outgrown add another machine and so
forth... I've had a small set-back: due to the chasis configuration I could
only fit in Half of the HDD's I intented.. the rest collide with the CPU
heatsinks (Don't ask)
 so my entire initial set-up has changed and with it my initial "growth
strategy"

On Wed, Mar 14, 2018 at 4:15 PM, Shawn Heisey  wrote:

> On 3/14/2018 5:49 AM, BlackIce wrote:
>
>> I was just thinking Do I really need separate VM's in order to run
>> multiple Solr instances? Doesn't it suffice to have each instance in its
>> own user account?
>>
>
> You can run multiple instances all under the same account on one machine.
> But for a single machine, why do you need multiple Solr instances at all?
> One instance can handle many indexes, and will probably do it more
> efficiently than multiple instances.
>
> The only time I would *ever* recommend multiple Solr instances is when a
> single instance would need an ENORMOUS Java heap -- something much larger
> than 32GB.  If something like that can be split into multiple instances
> where each one has a heap that's 31GB heap or less, then memory usage will
> be more efficient and Java's garbage collection will work better.
>
> FYI -- Running Java with a 32GB heap actually has LESS memory available
> than running it with a 31GB heap.  This is because when the heap reaches
> 32GB, Java must switch to 64-bit pointers, so every little allocation
> requires a little bit more memory.
>
> Thanks,
> Shawn
>
>

Re: Expose a metric for percentage-recovered during full recoveries

2018-03-15 Thread Rick Leir

S
Were there errors in the logs just before recoveries?
Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

LTR - OriginalScore query issue

2018-03-15 Thread ilayaraja

solr/collection/select?fl=id,score,[features+store=myFeatureStore+efi.user_query='black
shoes']&wt=json&q=id:"13245336"&debugQuery=on

When we fire this query during feature extraction, the originalScore feature
gets the score of the "id" match but not the actual user query which is in
this case 'black shoes'.

The feature definition is in features.json:
 {
  "name":"originalScore",
  "class":"org.apache.solr.ltr.feature.OriginalScoreFeature",
  "params":null,
  "store":"myFeatureStore"}

Is it the expected behaviour? I do not watch the tf/idf score for matching
the id, rather I would want to match score for the query w.r.t. that item.




-
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: In Place Updates not work as expected

2018-03-15 Thread Emir Arnautović

Hi,
Can you share part of code where you prepare update.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 14 Mar 2018, at 15:27, mganeshs  wrote:
> 
> Hi Emir,
> 
> I am using solrj to update the document. Is there any spl API to be used for
> in place Updates ? 
> 
> Yes are we are updating in Batch of 1000 documents. 
> 
> As I mentioned before, since I am updating only docvalues i expect it should
> update in faster than updating normal field. Isn't it ?
> 
> Regards,
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: solr query

2018-03-15 Thread Stefan Matheis

> Is this practical adding so much additional fields?

Well, as always "it depends" .. the way I see it: what are 20 fields? They
just sit around and make your life way easier.

You have two choices: stay with one field and have a hard time ... or add
another 20 or so which do exactly what you want.

To complete this with another use case: full text search including phonetic
matches. Instead of just having _one_ field for "name" and a awful list of
analyzers on it ... I'd probably go with a bunch of them: Separate first-
and last name, another two (one for each part of the name) for lowercase
variants, another two for phonetic variants .. which already makes six. You
get the possibility to do boosting on those aspects more or less for free
..

Almost always it's write once, read often .. so do the heavy lifting while
indexing and enjoy simple queries.

- Stefan

On Mar 15, 2018 8:12 AM, "Albert Lee"  wrote:

> Cause I got about 20 date fields or more. If add a separate field for it,
> then I have to add additional 3 field for each of them.  For example, for
> the field birthdate, I need to add birthdate_year, birthdate_month,
> birthdate_day.
> Is this practical adding so much additional fields?
>
> Albert
> From: Stefan Matheis
> Sent: Thursday, March 15, 2018 3:05 PM
> To: solr-user@lucene.apache.org
> Subject: RE: solr query
>
> > You have any other idea?
>
> Yes, we go back to start and discuss again why you're not adding a separate
> field for that. It's the simplest thing possible and avoids all those
> workarounds that got mentioned.
>
> -Stefan
>
> On Mar 15, 2018 4:08 AM, "Albert Lee"  wrote:
>
> > Hi Emir,
> >
> > If using OR-ed conditions for different years then the query will be very
> > long if I got 100 years and I think this is not practical.
> > You have any other idea?
> >
> > Regards,
> > Albert
> > From: Gus Heck
> > Sent: Thursday, March 15, 2018 12:43 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: solr query
> >
> > I think you have inadvertently "corrected" the intentional exclusive end
> on
> > my range... [NOW/MONTH TO NOW/MONTH+1MONTH}
> >
> > On Wed, Mar 14, 2018 at 12:08 PM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> > > Hi Gus,
> > > It is just current month, but Albert is interested in month, regardless
> > of
> > > year. It can be done with OR-ed conditions for different years:
> > > birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH] OR birthDate:[NOW-1YEAR/MONTH
> > TO
> > > NOW-1YEAR/MONTH+1MONTH] OR birthDate:[NOW-2YEAR/MONTH TO
> > > NOW-2YEAR/MONTH+1MONTH] OR…
> > >
> > > Emir
> > > --
> > > Monitoring - Log Management - Alerting - Anomaly Detection
> > > Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> > >
> > >
> > >
> > > > On 14 Mar 2018, at 16:55, Gus Heck  wrote:
> > > >
> > > > I think you can specify the current month with
> > > >
> > > > birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH}
> > > >
> > > > does that work for you?
> > > >
> > > > On Wed, Mar 14, 2018 at 6:32 AM, Emir Arnautović <
> > > > emir.arnauto...@sematext.com> wrote:
> > > >
> > > >> Actually you don’t have to add another field - there is function ms
> > that
> > > >> converts date to timestamp. What you can do is use frange query
> parser
> > > and
> > > >> play bit with math, e.g. sub(ms(date_field),ms(NOW/YEAR)) will give
> > you
> > > >> ms elapsed since this year and you know that from 0 to 31*8640
> is
> > > >> January, from 31*8640+1 to … is February and so on.
> > > >>
> > > >> If you go this path, I would suggest custom function that will
> convert
> > > >> date to month/year.
> > > >>
> > > >> HTH,
> > > >> Emir
> > > >> --
> > > >> Monitoring - Log Management - Alerting - Anomaly Detection
> > > >> Solr & Elasticsearch Consulting Support Training -
> > http://sematext.com/
> > > >>
> > > >>
> > > >>
> > > >>> On 14 Mar 2018, at 10:53, Albert Lee 
> > wrote:
> > > >>>
> > > >>> I don’t want to add separate fields since I have many dates to
> index.
> > > >> How to index it as timestamp and do function query, any example or
> > > >> documentation?
> > > >>>
> > > >>> Regards,
> > > >>> Albert
> > > >>>
> > > >>> From: Emir Arnautović
> > > >>> Sent: Wednesday, March 14, 2018 5:38 PM
> > > >>> To: solr-user@lucene.apache.org
> > > >>> Subject: Re: solr query
> > > >>>
> > > >>> Hi Albert,
> > > >>> The simplest solution is to index month/year as separate fields.
> > > >> Alternative is to index it as timestamp and do function query to do
> > some
> > > >> math and filter out records.
> > > >>>
> > > >>> Emir
> > > >>> --
> > > >>> Monitoring - Log Management - Alerting - Anomaly Detection
> > > >>> Solr & Elasticsearch Consulting Support Training -
> > > http://sematext.com/
> > > >>>
> > > >>>
> > > >>>
> > >  On 14 Mar 2018, at 10:31, Albert Lee 
> > wrote:
> > > 
> > >  NOW/MONTH and NOW/YEAR to get the start of month/year, but how
> can I
> > > >> get current month of regardless year. Like the use case,  people
> who’s
>

Re: Copying a SolrCloud collection to other hosts

2018-03-15 Thread Patrick Schemitz

Hi Erick,

thanks a lot, that solved our problem nicely.

(It took us a try or two to notice that this will not copy the entire
collection but only the shard on the source instance, and we need to do
this for all instances explicitly. But hey, we had to do the same for
the old approch of scp'ing the data directories.)

Ciao, Patrick

On Tue, Mar 06, 2018 at 07:18:15AM -0800, Erick Erickson wrote:
> this is part of the "different replica types" capability, there are
> NRT (the only type available prior to 7x), PULL and TLOG which would
> have different names. I don't know of any way to switch it off.
> 
> As far as moving the data, here's a little known trick: Use the
> replication API to issue a fetchindexk, see:
> https://lucene.apache.org/solr/guide/6_6/index-replication.html As
> long as the target cluster can "see" the source cluster via http, this
> should work. This is entirely outside SolrCloud and ZooKeeper is not
> involved. This would even work with, say, one side being stand-alone
> and the other being SolrCloud (not that you want to do that, just
> illustrating it's not part of SolrCloud)...
> 
> So you'd specify something like:
> http://target_node:port/solr/core_name/replication?command=fetchindex&masterUrl=http://source_node:port/solr/core_name
> 
> "core_name" in these cases is what appears in the "cores" dropdown on
> the admin UI page. You do not have to shut Solr down at all on either
> end to use this, although last I knew the target node would not serve
> queries while this was happening.
> 
> An alternative is to not hard-code the names in your copy script,
> rather look at the information in ZooKeeper for your source and target
> information, you could do this by using the CLUSTERSTATUS collections
> API call.
> 
> Best,
> Erick
> 
> On Tue, Mar 6, 2018 at 6:47 AM, Patrick Schemitz  wrote:
> > Hi List,
> >
> > so I'm running a bunch of SolrCloud clusters (each cluster is: 8 shards
> > on 2 servers, with 4 instances per server, no replicas, i.e. 1 shard per
> > instance).
> >
> > Building the index afresh takes 15+ hours, so when I have to deploy a new
> > index, I build it once, on one cluster, and then copy (scp) over the
> > data//index directories (shutting down the Solr instances 
> > first).
> >
> > I could get Solr 6.5.1 to number the shard/replica directories nicely via
> > the createNodeSet and createNodeSet.shuffle options:
> >
> > Solr 6.5.1 /var/lib/solr:
> >
> > Server node 1:
> > instance00/data/main_index_shard1_replica1
> > instance01/data/main_index_shard2_replica1
> > instance02/data/main_index_shard3_replica1
> > instance03/data/main_index_shard4_replica1
> >
> > Server node 2:
> > instance00/data/main_index_shard5_replica1
> > instance01/data/main_index_shard6_replica1
> > instance02/data/main_index_shard7_replica1
> > instance03/data/main_index_shard8_replica1
> >
> > However, while attempting to upgrade to 7.2.1, this numbering has changed:
> >
> > Solr 7.2.1 /var/lib/solr:
> >
> > Server node 1:
> > instance00/data/main_index_shard1_replica_n1
> > instance01/data/main_index_shard2_replica_n2
> > instance02/data/main_index_shard3_replica_n4
> > instance03/data/main_index_shard4_replica_n6
> >
> > Server node 2:
> > instance00/data/main_index_shard5_replica_n8
> > instance01/data/main_index_shard6_replica_n10
> > instance02/data/main_index_shard7_replica_n12
> > instance03/data/main_index_shard8_replica_n14
> >
> > This new numbering breaks my copy script, and furthermode, I'm worried
> > as to what happens when the numbering is different among target clusters.
> >
> > How can I switch this back to the old numbering scheme?
> >
> > Side note: is there a recommended way of doing this? Is the
> > backup/restore mechanism suitable for this? The ref guide is kind of terse
> > here.
> >
> > Thanks in advance,
> >
> > Ciao, Patrick

Re: Solr on DC/OS ?

2018-03-15 Thread Hendrik Haddorp


Hi,

we are running Solr on Marathon/Mesos, which should basically be the 
same as DC/OS. Solr and ZooKeeper are running in docker containers. I 
wrote my own Mesos framework that handles the assignment to the agents. 
There is a public sample that does the same for ElasticSearch. I'm not 
aware of a public Solr Mesos framework. The only "mediation" that 
happens here is that Solr runs in a docker container with a memory 
limit. If you give it enough resources it should be pretty close to 
running straight on the machine. JVM memory tuning and docker is however 
not the most fun.


regards,
Hendrik

On 15.03.2018 00:09, Rick Leir wrote:

Søren,
DC/OS installs on top of Ubuntu or RedHat, and it is used to coordinate many 
machines so they appear as a cluster.

Solr needs to be on a single machine, or in the case of SolrCloud, on many 
machines. It has no need of the coordination which DC/OS provides. Solr depends 
on direct access to lots of memory, and if any coordination layer attempts to 
mediate access to the memory then Solr would slow down. I recommend you install 
Solr directly on Ubuntu or Redhat or Windows Server (Disclosure: I know very 
little about DC/OS)
Cheers -- Rick


On March 14, 2018 6:19:22 AM EDT, "Søren"  wrote:

Hi, has anyone experience in running solr on DC/OS?

If so, how is that achieved succesfully? Solr is not in Universe.

Thanks in advance,
Soren

Re: Solr document routing using composite key

2018-03-15 Thread Zheng Lin Edwin Yeo

Hi,

What version of Solr are you running? How did you configure your shards in
Solr?

Regards,
Edwin

On 7 March 2018 at 02:53, Nawab Zada Asad Iqbal  wrote:

> Hi solr community:
>
>
> I have been thinking to use composite key for my next project iteration and
> tried it today to see how it distributes the documents.
>
> Here is a gist of my code:
> https://gist.github.com/niqbal/3e293e2bcb800d6912a250d914c9d478
>
> I have 117 shards and i tried to use document ids from zero to 116. I find
> that the distribution is very uneven, e.g., the largest bucket receives
> total 5 documents; and around 38 shards will be empty.  Is it expected?
>
> In the following result: value1 is the shard number, value 2 is a list of
> documents which it received.
>
> List(98:List(29)
> , 34:List(36)
> , 8:List(54)
> , 73:List(31)
> , 19:List(77)
> , 23:List(59)
> , 62:List(86)
> , 77:List(105)
> , 11:List(11)
> , 104:List(23)
> , 44:List(4)
> , 37:List(0)
> , 61:List(71)
> , 107:List(37)
> , 46:List(34)
> , 99:List(19)
> , 24:List(32)
> , 94:List(90)
> , 103:List(106)
> , 72:List(97)
> , 59:List(2)
> , 76:List(6)
> , 54:List(20)
> , 65:List(3)
> , 71:List(26)
> , 108:List(17)
> , 106:List(57)
> , 17:List(108)
> , 25:List(13)
> , 60:List(56)
> , 102:List(87)
> , 69:List(60)
> , 64:List(53)
> , 53:List(85)
> , 42:List(35)
> , 115:List(82)
> , 0:List(28)
> , 20:List(27)
> , 81:List(39)
> , 101:List(92)
> , 30:List(16)
> , 41:List(63)
> , 3:List(10)
> , 91:List(21)
> , 85:List(18)
> , 28:List(8)
> , 113:List(76, 95)
> , 51:List(47, 102)
> , 78:List(30, 67)
> , 4:List(52, 84)
> , 110:List(112, 116)
> , 9:List(1, 40)
> , 50:List(22, 101)
> , 13:List(72, 83)
> , 35:List(73, 100)
> , 16:List(48, 64)
> , 112:List(69, 103)
> , 10:List(14, 66)
> , 87:List(68, 104)
> , 57:List(49, 114)
> , 36:List(79, 99)
> , 1:List(24, 70)
> , 96:List(5, 98)
> , 95:List(45, 89)
> , 75:List(9, 91)
> , 70:List(62, 78)
> , 2:List(74, 75)
> , 114:List(81, 88)
> , 74:List(7, 115)
> , 52:List(46, 111)
> , 55:List(12, 50, 113)
> , 47:List(43, 44, 96)
> , 92:List(25, 33, 58)
> , 39:List(15, 41, 61, 107)
> , 21:List(38, 51, 55, 93, 110)
> , 27:List(42, 65, 80, 94, 109)
> )
>

RE: solr query

2018-03-15 Thread Albert Lee

Cause I got about 20 date fields or more. If add a separate field for it, then 
I have to add additional 3 field for each of them.  For example, for the field 
birthdate, I need to add birthdate_year, birthdate_month, birthdate_day.
Is this practical adding so much additional fields?

Albert 
From: Stefan Matheis
Sent: Thursday, March 15, 2018 3:05 PM
To: solr-user@lucene.apache.org
Subject: RE: solr query

> You have any other idea?

Yes, we go back to start and discuss again why you're not adding a separate
field for that. It's the simplest thing possible and avoids all those
workarounds that got mentioned.

-Stefan

On Mar 15, 2018 4:08 AM, "Albert Lee"  wrote:

> Hi Emir,
>
> If using OR-ed conditions for different years then the query will be very
> long if I got 100 years and I think this is not practical.
> You have any other idea?
>
> Regards,
> Albert
> From: Gus Heck
> Sent: Thursday, March 15, 2018 12:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: solr query
>
> I think you have inadvertently "corrected" the intentional exclusive end on
> my range... [NOW/MONTH TO NOW/MONTH+1MONTH}
>
> On Wed, Mar 14, 2018 at 12:08 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
> > Hi Gus,
> > It is just current month, but Albert is interested in month, regardless
> of
> > year. It can be done with OR-ed conditions for different years:
> > birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH] OR birthDate:[NOW-1YEAR/MONTH
> TO
> > NOW-1YEAR/MONTH+1MONTH] OR birthDate:[NOW-2YEAR/MONTH TO
> > NOW-2YEAR/MONTH+1MONTH] OR…
> >
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 14 Mar 2018, at 16:55, Gus Heck  wrote:
> > >
> > > I think you can specify the current month with
> > >
> > > birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH}
> > >
> > > does that work for you?
> > >
> > > On Wed, Mar 14, 2018 at 6:32 AM, Emir Arnautović <
> > > emir.arnauto...@sematext.com> wrote:
> > >
> > >> Actually you don’t have to add another field - there is function ms
> that
> > >> converts date to timestamp. What you can do is use frange query parser
> > and
> > >> play bit with math, e.g. sub(ms(date_field),ms(NOW/YEAR)) will give
> you
> > >> ms elapsed since this year and you know that from 0 to 31*8640 is
> > >> January, from 31*8640+1 to … is February and so on.
> > >>
> > >> If you go this path, I would suggest custom function that will convert
> > >> date to month/year.
> > >>
> > >> HTH,
> > >> Emir
> > >> --
> > >> Monitoring - Log Management - Alerting - Anomaly Detection
> > >> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> > >>
> > >>
> > >>
> > >>> On 14 Mar 2018, at 10:53, Albert Lee 
> wrote:
> > >>>
> > >>> I don’t want to add separate fields since I have many dates to index.
> > >> How to index it as timestamp and do function query, any example or
> > >> documentation?
> > >>>
> > >>> Regards,
> > >>> Albert
> > >>>
> > >>> From: Emir Arnautović
> > >>> Sent: Wednesday, March 14, 2018 5:38 PM
> > >>> To: solr-user@lucene.apache.org
> > >>> Subject: Re: solr query
> > >>>
> > >>> Hi Albert,
> > >>> The simplest solution is to index month/year as separate fields.
> > >> Alternative is to index it as timestamp and do function query to do
> some
> > >> math and filter out records.
> > >>>
> > >>> Emir
> > >>> --
> > >>> Monitoring - Log Management - Alerting - Anomaly Detection
> > >>> Solr & Elasticsearch Consulting Support Training -
> > http://sematext.com/
> > >>>
> > >>>
> > >>>
> >  On 14 Mar 2018, at 10:31, Albert Lee 
> wrote:
> > 
> >  NOW/MONTH and NOW/YEAR to get the start of month/year, but how can I
> > >> get current month of regardless year. Like the use case,  people who’s
> > >> birthdate is this month?
> > 
> >  Regard,
> >  Albert
> > 
> > 
> >  From: Emir Arnautović
> >  Sent: Wednesday, March 14, 2018 5:26 PM
> >  To: solr-user@lucene.apache.org
> >  Subject: Re: solr query
> > 
> >  Hi Albert,
> >  It does - you can use NOW/MONTH and NOW/YEAR to get the start of
> > >> month/year. Here is reference to date math:
> https://lucene.apache.org/
> > >> solr/guide/6_6/working-with-dates.html#WorkingwithDates-
> DateMathSyntax
> > <
> > >> https://lucene.apache.org/solr/guide/6_6/working-with-
> > >> dates.html#WorkingwithDates-DateMathSyntax>
> > 
> >  HTH,
> >  Emir
> >  --
> >  Monitoring - Log Management - Alerting - Anomaly Detection
> >  Solr & Elasticsearch Consulting Support Training -
> > http://sematext.com/
> > 
> > 
> > 
> > > On 14 Mar 2018, at 04:21, Albert Lee 
> > wrote:
> > >
> > > Dear Solr,
> > > I want to whether solr support query by this year or this month?
> > > If can, how to do that.
> > > Thanks.
> > >
> > > Regards,
> > > Albert
> > >
> > 
> > 
> > >>>
> > >>>
> > >>
> > >

RE: solr query

2018-03-15 Thread Stefan Matheis

> You have any other idea?

Yes, we go back to start and discuss again why you're not adding a separate
field for that. It's the simplest thing possible and avoids all those
workarounds that got mentioned.

-Stefan

On Mar 15, 2018 4:08 AM, "Albert Lee"  wrote:

> Hi Emir,
>
> If using OR-ed conditions for different years then the query will be very
> long if I got 100 years and I think this is not practical.
> You have any other idea?
>
> Regards,
> Albert
> From: Gus Heck
> Sent: Thursday, March 15, 2018 12:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: solr query
>
> I think you have inadvertently "corrected" the intentional exclusive end on
> my range... [NOW/MONTH TO NOW/MONTH+1MONTH}
>
> On Wed, Mar 14, 2018 at 12:08 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
> > Hi Gus,
> > It is just current month, but Albert is interested in month, regardless
> of
> > year. It can be done with OR-ed conditions for different years:
> > birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH] OR birthDate:[NOW-1YEAR/MONTH
> TO
> > NOW-1YEAR/MONTH+1MONTH] OR birthDate:[NOW-2YEAR/MONTH TO
> > NOW-2YEAR/MONTH+1MONTH] OR…
> >
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 14 Mar 2018, at 16:55, Gus Heck  wrote:
> > >
> > > I think you can specify the current month with
> > >
> > > birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH}
> > >
> > > does that work for you?
> > >
> > > On Wed, Mar 14, 2018 at 6:32 AM, Emir Arnautović <
> > > emir.arnauto...@sematext.com> wrote:
> > >
> > >> Actually you don’t have to add another field - there is function ms
> that
> > >> converts date to timestamp. What you can do is use frange query parser
> > and
> > >> play bit with math, e.g. sub(ms(date_field),ms(NOW/YEAR)) will give
> you
> > >> ms elapsed since this year and you know that from 0 to 31*8640 is
> > >> January, from 31*8640+1 to … is February and so on.
> > >>
> > >> If you go this path, I would suggest custom function that will convert
> > >> date to month/year.
> > >>
> > >> HTH,
> > >> Emir
> > >> --
> > >> Monitoring - Log Management - Alerting - Anomaly Detection
> > >> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> > >>
> > >>
> > >>
> > >>> On 14 Mar 2018, at 10:53, Albert Lee 
> wrote:
> > >>>
> > >>> I don’t want to add separate fields since I have many dates to index.
> > >> How to index it as timestamp and do function query, any example or
> > >> documentation?
> > >>>
> > >>> Regards,
> > >>> Albert
> > >>>
> > >>> From: Emir Arnautović
> > >>> Sent: Wednesday, March 14, 2018 5:38 PM
> > >>> To: solr-user@lucene.apache.org
> > >>> Subject: Re: solr query
> > >>>
> > >>> Hi Albert,
> > >>> The simplest solution is to index month/year as separate fields.
> > >> Alternative is to index it as timestamp and do function query to do
> some
> > >> math and filter out records.
> > >>>
> > >>> Emir
> > >>> --
> > >>> Monitoring - Log Management - Alerting - Anomaly Detection
> > >>> Solr & Elasticsearch Consulting Support Training -
> > http://sematext.com/
> > >>>
> > >>>
> > >>>
> >  On 14 Mar 2018, at 10:31, Albert Lee 
> wrote:
> > 
> >  NOW/MONTH and NOW/YEAR to get the start of month/year, but how can I
> > >> get current month of regardless year. Like the use case,  people who’s
> > >> birthdate is this month?
> > 
> >  Regard,
> >  Albert
> > 
> > 
> >  From: Emir Arnautović
> >  Sent: Wednesday, March 14, 2018 5:26 PM
> >  To: solr-user@lucene.apache.org
> >  Subject: Re: solr query
> > 
> >  Hi Albert,
> >  It does - you can use NOW/MONTH and NOW/YEAR to get the start of
> > >> month/year. Here is reference to date math:
> https://lucene.apache.org/
> > >> solr/guide/6_6/working-with-dates.html#WorkingwithDates-
> DateMathSyntax
> > <
> > >> https://lucene.apache.org/solr/guide/6_6/working-with-
> > >> dates.html#WorkingwithDates-DateMathSyntax>
> > 
> >  HTH,
> >  Emir
> >  --
> >  Monitoring - Log Management - Alerting - Anomaly Detection
> >  Solr & Elasticsearch Consulting Support Training -
> > http://sematext.com/
> > 
> > 
> > 
> > > On 14 Mar 2018, at 04:21, Albert Lee 
> > wrote:
> > >
> > > Dear Solr,
> > > I want to whether solr support query by this year or this month?
> > > If can, how to do that.
> > > Thanks.
> > >
> > > Regards,
> > > Albert
> > >
> > 
> > 
> > >>>
> > >>>
> > >>
> > >>
> > >
> > >
> > > --
> > > http://www.the111shift.com
> >
> >
>
>
> --
> http://www.the111shift.com
>
>

38 matches

Mail list logo