Re: Solr config upgrade tool

2017-08-11 Thread Hrishikesh Gadre
Interesting. This is certainly useful. I was also thinking about having a
tool which can setup a Solr server of intended version and test out the
upgraded configuration.

The validity of configuration files can be checked by creating a Solr
collection (or core) and verifying that it is loaded successfully.

Regarding indexing and querying - I think we can simplify this by having
user provide a sample data for indexing purpose along with set of queries
(along with the expected results) to try out. The tool then can
automatically index the documents, run queries and verify results.

-Hrishikesh

On Fri, Aug 11, 2017 at 1:29 PM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Hrishikesh Gadre,
>
> I'm interested in how that might integrate with continuous integration.  I
> briefly worked on a tool to try a configuration out with SolrCloud, e.g.
> upload the config, create a collection, run some stuff, test some stuff.  I
> got the first two working, but not the "run some stuff" and "test some
> stuff" parts of this.
>
> It seemed to me that since the main purposes of config are handlers and
> fields, that using SimplePostTool to import data and such would be useful.
>  I wanted to stick to Java to really validate that this stuff is really as
> easy to do in Java in Python/Node/Ruby.   However, it's not, which means a
> real solution would be opinionated and pick a language.
>
> If curious, you can look at my work at https://github.com/danizen/
> solr-config-tool.   The main innovation is to actually use Junit 4 POJO
> files as tests, because TestController creates a configured test case
> containing all of the test cases.   The other benefit of making these test
> cases in a Maven project, is that a maven based project that is using the
> solr-config-tool command-line could include these tests explicitly in a
> test plan, and their own tests as well.   In this case, they'd parameterize
> their test using properties, and then the SolrConfig object would tell them
> where to find the Solr collection under test.
>
> My intention was for this to help with big-iron investments in Solr, where
> they don't want to spin-up 5 new nodes to test one small schema, which
> would have been the case for my institution.
>
> Maybe I'll give this tool some love soon...
>
> -Original Message-
> From: Hrishikesh Gadre [mailto:gadre.s...@gmail.com]
> Sent: Friday, August 11, 2017 4:17 PM
> To: solr-user@lucene.apache.org; d...@lucene.apache.org
> Subject: Solr config upgrade tool
>
> Hi,
>
> I am currently working on a tool to identify (and in some cases fix) the
> incompatibilities in the Solr configuration (e.g. schema.xml,
> solrconfig.xml etc.) between major versions of Solr.
>
> My goal is to simplify the Solr upgrade process by providing upgrade
> instructions tailored to your configuration. These instuctions can help you
> to answer following questions
>
> - Does my Solr configuration have any backwards incompatible sections? If
> yes which ones?
> - For each of the incompatibility - what do I need to do to fix this
> incompatibility? Where can I get more information about why this
> incompatibility was introduced (e.g. references to Lucene/Solr jira)?
> - Are there any changes in Lucene/Solr which would require me to do a full
> reindexing OR can I get away with an index upgrade?
>
> The initial prototype of this tool is available @
> https://github.com/hgadre/solr-upgrade-tool
>
> For comments (or suggestions) please refer to
> https://issues.apache.org/jira/browse/SOLR-11229
>
> I have implemented upgrade rules for Solr 4 -> Solr 5 and Solr 5 -> Solr 6
> by referring to upgrade instructions in the Solr release notes. But during
> testing I found that many of the configuration changes (e.g. plugin
> deprecations) are not listed in the upgrade section. I am working on
> identifying such configuration changes and adding upgrade rules
> accordingly. I would love any community participation in improving this
> tool.
>
> I will be presenting a talk titled "Apache Solr - Upgrading your upgrade
> experience" @ Lucene revolution this year. Please do attend to know more
> about simplifying Solr upgrade process.
>
> Regards
>
> Hrishikesh
>


Re: Fetch a binary field

2017-08-11 Thread Dave
Why didn't you set it to be indexed? Sure it would be a small dent in an index

> On Aug 11, 2017, at 5:20 PM, Barbet Alain  wrote:
> 
> Re,
> I take a look on the source code where this msg happen
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/schema/SchemaField.java#L186
> 
> I use version 6.5 who differ from master.
> In 6.5:
> if (! (indexed() || hasDocValues()) ) {
> As my field is not indexed (and hasDocValues() is false as binary),
> this test fail.
> Ok but so, what is the way to get this field as I can see it in Luke
> (in hex format) ?
> 
> 
> 2017-08-11 15:41 GMT+02:00 Barbet Alain :
>> Hi !
>> 
>> 
>> I've a Lucene base coming from a C++ program linked with Lucene++, a
>> port of Lucene 3.5.9. When I open this base with Luke, it show Lucene
>> 2.9. Can see a binary field I have in Luke, with data encoded in
>> base64.
>> 
>> 
>> I have upgrade this base from 2.9 => 4.0 => 5.0 =>6.0 so I can use it
>> with Solr 6.5.1. I rebuild a schema for this base, who have one field
>> in binary:
>> 
>> 
>> 
>> 
>> 
>> When I try to retrieve this field with a query (with SOLR admin
>> interface) it's fail with "can not use FieldCache on a field which is
>> neither indexed nor has doc values: document"
>> 
>> 
>> I can retrieve others fields, but can't find a way for this one. Does
>> someone have an idea ? (At the end I want do this with php, but as it
>> fail with Solr interface too ).
>> 
>> Thank you for any help !


Re: Fetch a binary field

2017-08-11 Thread Barbet Alain
Re,
I take a look on the source code where this msg happen
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/schema/SchemaField.java#L186

I use version 6.5 who differ from master.
In 6.5:
if (! (indexed() || hasDocValues()) ) {
As my field is not indexed (and hasDocValues() is false as binary),
this test fail.
Ok but so, what is the way to get this field as I can see it in Luke
(in hex format) ?


2017-08-11 15:41 GMT+02:00 Barbet Alain :
> Hi !
>
>
> I've a Lucene base coming from a C++ program linked with Lucene++, a
> port of Lucene 3.5.9. When I open this base with Luke, it show Lucene
> 2.9. Can see a binary field I have in Luke, with data encoded in
> base64.
>
>
> I have upgrade this base from 2.9 => 4.0 => 5.0 =>6.0 so I can use it
> with Solr 6.5.1. I rebuild a schema for this base, who have one field
> in binary:
>
>
> 
>
>
> When I try to retrieve this field with a query (with SOLR admin
> interface) it's fail with "can not use FieldCache on a field which is
> neither indexed nor has doc values: document"
>
>
> I can retrieve others fields, but can't find a way for this one. Does
> someone have an idea ? (At the end I want do this with php, but as it
> fail with Solr interface too ).
>
> Thank you for any help !


Re: How to boost entire json document on top of results

2017-08-11 Thread Abhijit Pawar
Yes...That's true for now I can sort based on that field but was wondering
how index time boost really works at the document level for JSON documents
indexed using an DataImportHandler.

Thanks Eric!

Regards,

Abhijit






On Fri, Aug 11, 2017 at 1:50 PM, Erick Erickson 
wrote:

> Well, you have to _do_ something with the field. And you don't
> particularly want to do document boosting at index time in the first
> place as this has been removed recently IIRC. Note this is different
> than just putting a _value_ in some field you _use_ for boosting..
>
> Anyway, you state:
> "I would like to have documents with boost value 30 to be shown above
> the documents with value 10 or 20"
>
> If it's acceptable that all docs with a value of 30 appear before all
> docs with a value of 20 which appear before all docs with a value of
> 10, then this is just a simple sort
>
> sort=boost desc,score desc
>
> If you want to do something fancier, see Function Queries in the ref
> guide. Or use edismax query parsing and add a boost query
>
> Best,
> Erick
>
>
>
> On Fri, Aug 11, 2017 at 10:19 AM, Abhijit Pawar
>  wrote:
> > Hi,
> >
> > I am working on a scenario trying to boost certain documents while
> indexing
> > over other documents.
> >
> > I am using  a DataImportHandler to import JSON documents from a MongoDB
> > datasource.
> >
> > Here's my data-source-config file :
> >
> > 
> >  > driver="com.mongodb.jdbc.MongoDriver" url="mongodb://<>:27017/
> CORE"/>
> >   > function BoostDoc(row) {
> >if(row.get('boost') == '10') {
> >   row.put('$docBoost', 10);
> >}
> > if(row.get('boost') == '20') {
> >   row.put('$docBoost', 20);
> >}
> >if(row.get('boost') == '30') {
> >   row.put('$docBoost', 30);
> >}
> >return row;
> > }
> >   ]]>
> > 
> >  > dataSource="mongod"
> > transformer="ProdsCatsFieldTransformer,script:BoostDoc,Templ
> ateTransformer"
> > onError="continue"
> > pk="uuid"
> > query="SELECT  idStr,orgidStr,name,boost,code,description
> > FROM products"
> > deltaImportQuery="SELECT idStr,orgidStr,name,boost,code,description
> > FROM products"...
> > deltaQuery="SELECT idStr FROM products WHERE orgidStr 
> >
> >>
> > 
> > 
> > 
> > 
> > 
> >
> > I have added omiNorms to unique identifier  - uuid.
> >
> >
> > With this configuration I am unable to see any change in the results
> > returned while search.Basically I would like to have documents with boost
> > value 30 to be shown above the documents with value 10 or 20.
> >
> > Is there change I need to make in order to apply boost values using
> > $docBoost with ScriptTransformer ?
> >
> > Also any other alternative approach that I may use for achieving the
> > boosted documents to show up in the results list.
> >
> >
> >
> > Regards,
> >
> > Abhijit
>


RE: Solr config upgrade tool

2017-08-11 Thread Davis, Daniel (NIH/NLM) [C]
Hrishikesh Gadre,

I'm interested in how that might integrate with continuous integration.  I 
briefly worked on a tool to try a configuration out with SolrCloud, e.g. upload 
the config, create a collection, run some stuff, test some stuff.  I got the 
first two working, but not the "run some stuff" and "test some stuff" parts of 
this.

It seemed to me that since the main purposes of config are handlers and fields, 
that using SimplePostTool to import data and such would be useful.   I wanted 
to stick to Java to really validate that this stuff is really as easy to do in 
Java in Python/Node/Ruby.   However, it's not, which means a real solution 
would be opinionated and pick a language.

If curious, you can look at my work at 
https://github.com/danizen/solr-config-tool.   The main innovation is to 
actually use Junit 4 POJO files as tests, because TestController creates a 
configured test case containing all of the test cases.   The other benefit of 
making these test cases in a Maven project, is that a maven based project that 
is using the solr-config-tool command-line could include these tests explicitly 
in a test plan, and their own tests as well.   In this case, they'd 
parameterize their test using properties, and then the SolrConfig object would 
tell them where to find the Solr collection under test.

My intention was for this to help with big-iron investments in Solr, where they 
don't want to spin-up 5 new nodes to test one small schema, which would have 
been the case for my institution.

Maybe I'll give this tool some love soon...

-Original Message-
From: Hrishikesh Gadre [mailto:gadre.s...@gmail.com] 
Sent: Friday, August 11, 2017 4:17 PM
To: solr-user@lucene.apache.org; d...@lucene.apache.org
Subject: Solr config upgrade tool

Hi,

I am currently working on a tool to identify (and in some cases fix) the 
incompatibilities in the Solr configuration (e.g. schema.xml, solrconfig.xml 
etc.) between major versions of Solr.

My goal is to simplify the Solr upgrade process by providing upgrade 
instructions tailored to your configuration. These instuctions can help you to 
answer following questions

- Does my Solr configuration have any backwards incompatible sections? If yes 
which ones?
- For each of the incompatibility - what do I need to do to fix this 
incompatibility? Where can I get more information about why this 
incompatibility was introduced (e.g. references to Lucene/Solr jira)?
- Are there any changes in Lucene/Solr which would require me to do a full 
reindexing OR can I get away with an index upgrade?

The initial prototype of this tool is available @ 
https://github.com/hgadre/solr-upgrade-tool

For comments (or suggestions) please refer to
https://issues.apache.org/jira/browse/SOLR-11229

I have implemented upgrade rules for Solr 4 -> Solr 5 and Solr 5 -> Solr 6 by 
referring to upgrade instructions in the Solr release notes. But during testing 
I found that many of the configuration changes (e.g. plugin
deprecations) are not listed in the upgrade section. I am working on 
identifying such configuration changes and adding upgrade rules accordingly. I 
would love any community participation in improving this tool.

I will be presenting a talk titled "Apache Solr - Upgrading your upgrade 
experience" @ Lucene revolution this year. Please do attend to know more about 
simplifying Solr upgrade process.

Regards

Hrishikesh


Solr config upgrade tool

2017-08-11 Thread Hrishikesh Gadre
Hi,

I am currently working on a tool to identify (and in some cases fix) the
incompatibilities in the Solr configuration (e.g. schema.xml,
solrconfig.xml etc.) between major versions of Solr.

My goal is to simplify the Solr upgrade process by providing upgrade
instructions tailored to your configuration. These instuctions can help you
to answer following questions

- Does my Solr configuration have any backwards incompatible sections? If
yes which ones?
- For each of the incompatibility - what do I need to do to fix this
incompatibility? Where can I get more information about why this
incompatibility was introduced (e.g. references to Lucene/Solr jira)?
- Are there any changes in Lucene/Solr which would require me to do a full
reindexing OR can I get away with an index upgrade?

The initial prototype of this tool is available @
https://github.com/hgadre/solr-upgrade-tool

For comments (or suggestions) please refer to
https://issues.apache.org/jira/browse/SOLR-11229

I have implemented upgrade rules for Solr 4 -> Solr 5 and Solr 5 -> Solr 6
by referring to upgrade instructions in the Solr release notes. But during
testing I found that many of the configuration changes (e.g. plugin
deprecations) are not listed in the upgrade section. I am working on
identifying such configuration changes and adding upgrade rules
accordingly. I would love any community participation in improving this
tool.

I will be presenting a talk titled "Apache Solr - Upgrading your upgrade
experience" @ Lucene revolution this year. Please do attend to know more
about simplifying Solr upgrade process.

Regards

Hrishikesh


Re: How to boost entire json document on top of results

2017-08-11 Thread Erick Erickson
Well, you have to _do_ something with the field. And you don't
particularly want to do document boosting at index time in the first
place as this has been removed recently IIRC. Note this is different
than just putting a _value_ in some field you _use_ for boosting..

Anyway, you state:
"I would like to have documents with boost value 30 to be shown above
the documents with value 10 or 20"

If it's acceptable that all docs with a value of 30 appear before all
docs with a value of 20 which appear before all docs with a value of
10, then this is just a simple sort

sort=boost desc,score desc

If you want to do something fancier, see Function Queries in the ref
guide. Or use edismax query parsing and add a boost query

Best,
Erick



On Fri, Aug 11, 2017 at 10:19 AM, Abhijit Pawar
 wrote:
> Hi,
>
> I am working on a scenario trying to boost certain documents while indexing
> over other documents.
>
> I am using  a DataImportHandler to import JSON documents from a MongoDB
> datasource.
>
> Here's my data-source-config file :
>
> 
>  driver="com.mongodb.jdbc.MongoDriver" url="mongodb://<>:27017/CORE"/>
>   function BoostDoc(row) {
>if(row.get('boost') == '10') {
>   row.put('$docBoost', 10);
>}
> if(row.get('boost') == '20') {
>   row.put('$docBoost', 20);
>}
>if(row.get('boost') == '30') {
>   row.put('$docBoost', 30);
>}
>return row;
> }
>   ]]>
> 
>  dataSource="mongod"
> transformer="ProdsCatsFieldTransformer,script:BoostDoc,TemplateTransformer"
> onError="continue"
> pk="uuid"
> query="SELECT  idStr,orgidStr,name,boost,code,description
> FROM products"
> deltaImportQuery="SELECT idStr,orgidStr,name,boost,code,description
> FROM products"...
> deltaQuery="SELECT idStr FROM products WHERE orgidStr 
>
>>
> 
> 
> 
> 
> 
>
> I have added omiNorms to unique identifier  - uuid.
>
>
> With this configuration I am unable to see any change in the results
> returned while search.Basically I would like to have documents with boost
> value 30 to be shown above the documents with value 10 or 20.
>
> Is there change I need to make in order to apply boost values using
> $docBoost with ScriptTransformer ?
>
> Also any other alternative approach that I may use for achieving the
> boosted documents to show up in the results list.
>
>
>
> Regards,
>
> Abhijit


Re: missing documents after restart

2017-08-11 Thread Erick Erickson
Thanks for closing this out, I was breaking out in hives ;)

Erick

On Fri, Aug 11, 2017 at 11:31 AM, John Blythe  wrote:
> Looks like part of our nightly processing was restarting the solr server
> before all indexing was done bc of using a blunt object approach of doing
> so at designated times, doh!
>
> On Tue, Aug 8, 2017 at 9:35 PM John Blythe  wrote:
>
>> Thanks Erick. I don't think all of those ifs are in place. Must be
>> something in our nightly process that is conflicting. Will dive in tomorrow
>> to figure out and report back.
>>
>> On Tue, Aug 8, 2017 at 1:27 PM Erick Erickson 
>> wrote:
>>
>>> First, are you absolutely sure you're committing before shutting down?
>>> Hard commit in this case, openSearcher shouldn't matter.
>>>
>>> SolrCloud? And if not SolrCloud, how are you shutting Solr down? "Kill
>>> -9" is evil.
>>>
>>> If you have transaction logs enabled then you shouldn't be losing
>>> docs, any uncommitted docs should be replayed from the transaction
>>> log.
>>>
>>> This should absolutely _not_ be happening assuming hard commits
>>> happen. One possible explanation is that you have hard commits turned
>>> off and are using soft commits _and_ the transaction log isn't
>>> enabled. And you kill Solr with evil intent. But that's a long chain
>>> of "ifs"
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Aug 8, 2017 at 6:02 AM, John Blythe  wrote:
>>> > hi all.
>>> >
>>> > i have a core that contains about 22 million documents. when the solr
>>> > server is restarted it drops to 200-400k. the dashbaord says that it's
>>> both
>>> > optimized and current.
>>> >
>>> > is there config issues i need to address in solr or the server? not
>>> really
>>> > sure where to begin in hunting this down.
>>> >
>>> > thanks-
>>>
>> --
>> --
>> *John Blythe*
>> Product Manager & Lead Developer
>>
>> 251.605.3071 | j...@curvolabs.com
>> www.curvolabs.com
>>
>> 58 Adams Ave
>> Evansville, IN 47713
>>
> --
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713


Re: missing documents after restart

2017-08-11 Thread John Blythe
Looks like part of our nightly processing was restarting the solr server
before all indexing was done bc of using a blunt object approach of doing
so at designated times, doh!

On Tue, Aug 8, 2017 at 9:35 PM John Blythe  wrote:

> Thanks Erick. I don't think all of those ifs are in place. Must be
> something in our nightly process that is conflicting. Will dive in tomorrow
> to figure out and report back.
>
> On Tue, Aug 8, 2017 at 1:27 PM Erick Erickson 
> wrote:
>
>> First, are you absolutely sure you're committing before shutting down?
>> Hard commit in this case, openSearcher shouldn't matter.
>>
>> SolrCloud? And if not SolrCloud, how are you shutting Solr down? "Kill
>> -9" is evil.
>>
>> If you have transaction logs enabled then you shouldn't be losing
>> docs, any uncommitted docs should be replayed from the transaction
>> log.
>>
>> This should absolutely _not_ be happening assuming hard commits
>> happen. One possible explanation is that you have hard commits turned
>> off and are using soft commits _and_ the transaction log isn't
>> enabled. And you kill Solr with evil intent. But that's a long chain
>> of "ifs"
>>
>> Best,
>> Erick
>>
>> On Tue, Aug 8, 2017 at 6:02 AM, John Blythe  wrote:
>> > hi all.
>> >
>> > i have a core that contains about 22 million documents. when the solr
>> > server is restarted it drops to 200-400k. the dashbaord says that it's
>> both
>> > optimized and current.
>> >
>> > is there config issues i need to address in solr or the server? not
>> really
>> > sure where to begin in hunting this down.
>> >
>> > thanks-
>>
> --
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
-- 
-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713


How to boost entire json document on top of results

2017-08-11 Thread Abhijit Pawar
Hi,

I am working on a scenario trying to boost certain documents while indexing
over other documents.

I am using  a DataImportHandler to import JSON documents from a MongoDB
datasource.

Here's my data-source-config file :



 







I have added omiNorms to unique identifier  - uuid.


With this configuration I am unable to see any change in the results
returned while search.Basically I would like to have documents with boost
value 30 to be shown above the documents with value 10 or 20.

Is there change I need to make in order to apply boost values using
$docBoost with ScriptTransformer ?

Also any other alternative approach that I may use for achieving the
boosted documents to show up in the results list.



Regards,

Abhijit


Re: Soft commit uploading datas cant search on website

2017-08-11 Thread Erick Erickson
First, if you specify commit it's doing a hard commit with
openSearcher=true by default so the softCommit isn't necessary here.
I'd do one or the other, as it's possible that Solr is stopping at the
first one.

bq: when i do the hardcommit manually . then its shows the result on website.

I don't know what that means in this context.

The most obvious difference is that you have "fq" clauses in the one
that doesn't work, what happens when you remove them?

Best,
Erick

On Fri, Aug 11, 2017 at 3:00 AM, Abdul Ekfhy  wrote:
> I configure the softcommit in solrconfig.xml, and when i add a new
> entry(example : solrtest) from website and run below commit url
> "http://192.168.2.10:8983/solr/goods/update?softCommit=true&commit=true "
>
> when i run this and check on the query  keyword:solrtest its shows the entry
> on xml format
>
> and also its increase +1  in numdocs for goods.
>
> but it not shows on website search.
>
> when i do the hardcommit manually . then its shows the result on website.
>
> any idea what could be the issue.
>
> my configurations are below
>
> solrconfig.xml
>
>   
> ${solr.autoCommit.maxTime:15000}
> false
>   
>
>
>   
>1
>   
>
>
> solr.log
> ..
> ..
> search on query(its shows hits=1)
> .
> 2017-08-11 09:15:28.060 INFO  (qtp985934102-14) [c:goods s:shard1
> r:core_node3 x:goods_v6.6] o.a.s.c.S.Request [goods_v6.6]  webapp=/solr
> path=/select params={q=keyword:solrtest&indent=on&wt=json&_=1502439335248}
> hits=1 status=0 QTime=1
> .
> search on website(its shows hits=0)
> ..
> 2017-08-11 09:14:37.226 INFO  (qtp985934102-47) [c:goods s:shard1
> r:core_node3 x:goods_v6.6] o.a.s.c.S.Request [goods_v6.6]  webapp=/solr
> path=/select
> params={q=keyword:"*solrtest*"&start=0&fq=storePrice:[0+TO+9]&fq=goodsStatus:0&sort=goodsClick+desc&rows=12&wt=javabin&version=2}
> hits=0 status=0 QTime=2
>
>
> any configurations i need to  add more to solrconfig.xml ??
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Soft-commit-uploading-datas-cant-search-on-website-tp4350186.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with query syntax

2017-08-11 Thread Erick Erickson
Yep..

On Fri, Aug 11, 2017 at 6:31 AM, OTH  wrote:
> Hi, thanks for sharing the article.
>
> On Fri, Aug 11, 2017 at 4:38 AM, Erick Erickson 
> wrote:
>
>> Omer:
>>
>> Solr does not implement pure boolean logic, see:
>> https://lucidworks.com/2011/12/28/why-not-and-or-and-not/.
>>
>> With appropriate parentheses it can give the same results as you're
>> discovering.
>>
>> Best
>> Erick
>>
>> On Thu, Aug 10, 2017 at 3:00 PM, OTH  wrote:
>> > Thanks for the help!
>> > That's resolved the issue.
>> >
>> > On Fri, Aug 11, 2017 at 1:48 AM, David Hastings <
>> > hastings.recurs...@gmail.com> wrote:
>> >
>> >> type:value AND (name:america^1+name:state^1+name:united^1)
>> >>
>> >> but in reality what you want to do is use the fq parameter with
>> type:value
>> >>
>> >> On Thu, Aug 10, 2017 at 4:36 PM, OTH  wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > I have the following use case:
>> >> >
>> >> > I have two fields (among others); one is 'name' and the other is
>> 'type'.
>> >> >  'Name' is the field I need to search, whereas, with 'type', I need to
>> >> make
>> >> > sure that it has a certain value, depending on the situation.  Often,
>> >> when
>> >> > I search the 'name' field, the search query would have multiple
>> tokens.
>> >> > Furthermore, each query token needs to have a scoring weight attached
>> to
>> >> > it.
>> >> >
>> >> > However, I'm unable to figure out the syntax which would allow all
>> these
>> >> > things to happen.
>> >> >
>> >> > For example, if I use the following query:
>> >> > select?q=type:value+AND+name:america^1+name:state^1+name:united^1
>> >> > It would only return documents where 'name' includes the token
>> 'america'
>> >> > (and where type==value).  It will totally ignore
>> >> > "+name:state^1+name:united^1", it seems.
>> >> >
>> >> > This does not happen if I omit "type:value+AND+".  So, with the
>> following
>> >> > query:
>> >> > select?q=name:america^1+name:state^1+name:united^1
>> >> > It returns all documents which contain any of the three tokens
>> {america,
>> >> > state, united}; which is what I need.  However, it also returns
>> documents
>> >> > where type != value; which I can't have.
>> >> >
>> >> > If I put "type:value" at the end of the query command, like so:
>> >> > select?q=name:america^1+name:state^1+name:united^1+AND+type:value
>> >> > In this case, it will only return documents which contain the "united"
>> >> > token in the name field (and where type==value).  Again, it will
>> totally
>> >> > ignore "name:america^1+name:state^1", it seems.
>> >> >
>> >> > I tried putting an "AND" between everything, like so:
>> >> > select?q=type:value+AND+name:america^1+AND+name:state^1+
>> >> AND+name:united^1
>> >> > But this, of course, would only return documents which contain all the
>> >> > tokens {america, state, united}; whereas I need all documents which
>> >> contain
>> >> > any of those tokens.
>> >> >
>> >> >
>> >> > If anyone could help me out with how this could be done / what the
>> >> correct
>> >> > syntax would be, that would be a huge help.
>> >> >
>> >> > Much thanks
>> >> > Omer
>> >> >
>> >>
>>


RE: Token "states" not getting lemmatized by Solr?

2017-08-11 Thread Markus Jelsma
I checked our English analyzer using KStemFilter. To my surprise, both united 
and states are not affected by the filter.

Regards,
Markus

 
 
-Original message-
> From:Ahmet Arslan 
> Sent: Thursday 10th August 2017 21:57
> To: solr-user@lucene.apache.org
> Subject: Re: Token "states" not getting lemmatized by Solr?
> 
> Hi Omer,
> Your analysis chain does not include a stem filter (lemmatizer)
> Assuming you are dealing with English text, you can use KStemFilterFactory or 
> SnowballFilterFactory.
> Ahmet
> 
> On Thursday, August 10, 2017, 9:33:08 PM GMT+3, OTH  
> wrote:
> 
> Hi,
> 
> Regarding 'analysis chain':
> 
> I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
>  positionIncrementGap="100" multiValued="true">  
>      
>        
>       ignoreCase="true"/>  
>        
>      
>      
>        
>       ignoreCase="true"/>  
>       ignoreCase="true" synonyms="synonyms.txt"/>  
>        
>      
>  
> 
> Regarding the Admin UI >> Analysis page:  I just tried that, and to be
> honest, I can't seem to get much useful info out of it, especially in terms
> of lemmatization.
> 
> For example, for any text I enter in it to "analyse", all it does is seem
> to tell me which analysers (if that's the right term?) are being used for
> the selected field / fieldtype, and for each of these analyzers, it would
> give some very basic info, like text, raw_bytes, etc.  Eg, for the input
> "united" in the "field value (index)" box, having "text_general" selected
> for fieldtype, all I get is this:
> 
> ST
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> SF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> LCF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
> "org.apache.lucene.analysis.standard.StandardTokenizer",
> "org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.
> 
> So - should 'states' not be lemmatized to 'state' using these settings?
> (If not, then I would need to figure out how to use a different lemmatizer)
> 
> Thanks
> 
> On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson 
> wrote:
> 
> > saying the field is "text_general" is not sufficient, please post the
> > analysis chain defined in your schema.
> >
> > Also the admin UI>>analysis page will help you figure out exactly what
> > part of the analysis chain does what.
> >
> > Best,
> > Erick
> >
> > On Thu, Aug 10, 2017 at 8:37 AM, OTH  wrote:
> > > Hello,
> > >
> > > It seems for me that the token "states" is not getting lemmatized to
> > > "state" by Solr.
> > >
> > > Eg, I have a document with the value "united states of america".
> > > This document is not returned when the following query is issued:
> > > q=name:state^1+name:america^1+name:united^1
> > > However, all documents which contain the token "state" are indeed
> > returned,
> > > with the above query.
> > > The "united states of america" document is returned if I change "state"
> > in
> > > the query to "states"; so:
> > > q=name:states^1+name:america^1+name:united^1
> > >
> > > At first I thought maybe the lemmatization isn't working for some reason.
> > > However, when I changed "united" in the query to "unite", then it did
> > still
> > > return the "united states of america" document:
> > > q=name:states^1+name:america^1+name:unite^1
> > > Which means that the lemmatization is working for the token "united", but
> > > not for the token "states".
> > >
> > > The "name" field above is defined as "text_general".
> > >
> > > So it seems to me, that perhaps the default Solr lemmatizer does not
> > > lemmatize "states" to "state"?
> > > Can anyone confirm if this is indeed the expected behaviour?
> > > And what can I do to change it?
> > > If I need to put in a customer lemmatizer, then what would be the (best)
> > > way to do that?
> > >
> > > Much thanks
> > > Omer
> >


Solr Keyword search across docs?

2017-08-11 Thread Aaron Gibbons
I'm trying to produce a search that would optionally join the contents of 2
documents then allow a keyword search on them as if it were a single doc.

For example I'd have Person index and a Note index.  I want to search the
person document combined with the notes for keywords A AND B AND C.  A and
B are in the Person Document but C is in at Person's Notes.

What is the best way to accomplish this and still preserve searching on
just Person or Note individually?

I tried a join but it seems to keyword search only the Person document and
then filters those down by those with Notes.
Simplified query:
q={!join+from=person_note_id+to=id}A+B+C&fq=type:Person


The only other option I can see is to index all of the Notes contents on
the Person record. But this seems like a lot when I also have the
requirement to have them in their own separate index to search on their own.

Thank you for any help!


Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

2017-08-11 Thread Shawn Heisey
On 8/11/2017 2:52 AM, SOLR4189 wrote:
> Yes, only because I'm seeing different results. 
>
> For example, changing *WordDelimiterFilterFactory *to
> *WordDelimiterGraphFilterFactory * can change order of docs? (
> http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html
> 
>  
> )

I can't say for sure, but if that difference changes what parts of your
query match or don't match, that is very likely to affect document scores.

> For building index I tried 2 ways: 1) Dataimport from SOLR-4 to SOLR-6 and
> 2) IndexUpgraderTool
> And in both ways order of docs is different.

If you are changing things like WordDelimiterFilterFactory to the graph
version, you'll definitely want to reindex.  The IndexUpgrader tool is
not a reindex.  If the Solr 4 index meets the requirements of having all
relevant fields stored, then doing a dataimport from 4 to 6 would be the
same as a reindex.

Thanks,
Shawn



Re: solr commit is taking time

2017-08-11 Thread Shawn Heisey
On 8/11/2017 4:16 AM, Midas A wrote:
> our solr commit is taking time
>
> 10.20.73.92 - - [11/Aug/2017:15:44:00 +0530] "POST
> /solr/##/update?wt=javabin&version=2 HTTP/1.1" 200 - 12594 What should
> i check ?

https://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

Thanks,
Shawn



Fetch a binary field

2017-08-11 Thread Barbet Alain
Hi !


I've a Lucene base coming from a C++ program linked with Lucene++, a
port of Lucene 3.5.9. When I open this base with Luke, it show Lucene
2.9. Can see a binary field I have in Luke, with data encoded in
base64.


I have upgrade this base from 2.9 => 4.0 => 5.0 =>6.0 so I can use it
with Solr 6.5.1. I rebuild a schema for this base, who have one field
in binary:





When I try to retrieve this field with a query (with SOLR admin
interface) it's fail with "can not use FieldCache on a field which is
neither indexed nor has doc values: document"


I can retrieve others fields, but can't find a way for this one. Does
someone have an idea ? (At the end I want do this with php, but as it
fail with Solr interface too ).

Thank you for any help !


Re: Need help with query syntax

2017-08-11 Thread OTH
Hi, thanks for sharing the article.

On Fri, Aug 11, 2017 at 4:38 AM, Erick Erickson 
wrote:

> Omer:
>
> Solr does not implement pure boolean logic, see:
> https://lucidworks.com/2011/12/28/why-not-and-or-and-not/.
>
> With appropriate parentheses it can give the same results as you're
> discovering.
>
> Best
> Erick
>
> On Thu, Aug 10, 2017 at 3:00 PM, OTH  wrote:
> > Thanks for the help!
> > That's resolved the issue.
> >
> > On Fri, Aug 11, 2017 at 1:48 AM, David Hastings <
> > hastings.recurs...@gmail.com> wrote:
> >
> >> type:value AND (name:america^1+name:state^1+name:united^1)
> >>
> >> but in reality what you want to do is use the fq parameter with
> type:value
> >>
> >> On Thu, Aug 10, 2017 at 4:36 PM, OTH  wrote:
> >>
> >> > Hello,
> >> >
> >> > I have the following use case:
> >> >
> >> > I have two fields (among others); one is 'name' and the other is
> 'type'.
> >> >  'Name' is the field I need to search, whereas, with 'type', I need to
> >> make
> >> > sure that it has a certain value, depending on the situation.  Often,
> >> when
> >> > I search the 'name' field, the search query would have multiple
> tokens.
> >> > Furthermore, each query token needs to have a scoring weight attached
> to
> >> > it.
> >> >
> >> > However, I'm unable to figure out the syntax which would allow all
> these
> >> > things to happen.
> >> >
> >> > For example, if I use the following query:
> >> > select?q=type:value+AND+name:america^1+name:state^1+name:united^1
> >> > It would only return documents where 'name' includes the token
> 'america'
> >> > (and where type==value).  It will totally ignore
> >> > "+name:state^1+name:united^1", it seems.
> >> >
> >> > This does not happen if I omit "type:value+AND+".  So, with the
> following
> >> > query:
> >> > select?q=name:america^1+name:state^1+name:united^1
> >> > It returns all documents which contain any of the three tokens
> {america,
> >> > state, united}; which is what I need.  However, it also returns
> documents
> >> > where type != value; which I can't have.
> >> >
> >> > If I put "type:value" at the end of the query command, like so:
> >> > select?q=name:america^1+name:state^1+name:united^1+AND+type:value
> >> > In this case, it will only return documents which contain the "united"
> >> > token in the name field (and where type==value).  Again, it will
> totally
> >> > ignore "name:america^1+name:state^1", it seems.
> >> >
> >> > I tried putting an "AND" between everything, like so:
> >> > select?q=type:value+AND+name:america^1+AND+name:state^1+
> >> AND+name:united^1
> >> > But this, of course, would only return documents which contain all the
> >> > tokens {america, state, united}; whereas I need all documents which
> >> contain
> >> > any of those tokens.
> >> >
> >> >
> >> > If anyone could help me out with how this could be done / what the
> >> correct
> >> > syntax would be, that would be a huge help.
> >> >
> >> > Much thanks
> >> > Omer
> >> >
> >>
>


QueryParser changes query by itself

2017-08-11 Thread Bernd Fehling
We just noticed a very strange problem with Solr 6.4.2 QueryParser.
The QueryParser changes the query by itself from time to time.
This happens if doing a search request reload several times at higher rate.

Good example:
...
textth:waffenhandel
  
...
textth:waffenhandel
textth:waffenhandel
  +SynonymQuery(Synonym(textth:"arms sales" 
textth:"arms trade"...
  +Synonym(textth:"arms sales" textth:"arms 
trade"...


Bad example:
...
textth:waffenhandel
  
...
textth:waffenhandel
textth:waffenhandel
  +textth:rss
  +textth:rss

As you can see in the bad example after several reloads the parsedquery changed 
to term "rss".
But the original querystring has no "rss" substring at all. That is really 
strange.

Anyone seen this before?

Single index, Solr 6.4.2.

Regards
Bernd


Soft commit uploading datas cant search on website

2017-08-11 Thread Abdul Ekfhy
I configure the softcommit in solrconfig.xml, and when i add a new
entry(example : solrtest) from website and run below commit url
"http://192.168.2.10:8983/solr/goods/update?softCommit=true&commit=true "

when i run this and check on the query  keyword:solrtest its shows the entry
on xml format

and also its increase +1  in numdocs for goods.

but it not shows on website search.

when i do the hardcommit manually . then its shows the result on website.

any idea what could be the issue.

my configurations are below

solrconfig.xml

  
${solr.autoCommit.maxTime:15000}
false
  


  
   1
  


solr.log 
..
..
search on query(its shows hits=1)
.
2017-08-11 09:15:28.060 INFO  (qtp985934102-14) [c:goods s:shard1
r:core_node3 x:goods_v6.6] o.a.s.c.S.Request [goods_v6.6]  webapp=/solr
path=/select params={q=keyword:solrtest&indent=on&wt=json&_=1502439335248}
hits=1 status=0 QTime=1
.
search on website(its shows hits=0)
..
2017-08-11 09:14:37.226 INFO  (qtp985934102-47) [c:goods s:shard1
r:core_node3 x:goods_v6.6] o.a.s.c.S.Request [goods_v6.6]  webapp=/solr
path=/select
params={q=keyword:"*solrtest*"&start=0&fq=storePrice:[0+TO+9]&fq=goodsStatus:0&sort=goodsClick+desc&rows=12&wt=javabin&version=2}
hits=0 status=0 QTime=2


any configurations i need to  add more to solrconfig.xml ??




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Soft-commit-uploading-datas-cant-search-on-website-tp4350186.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr LTR with high rerankDocs

2017-08-11 Thread Sebastian Klemke
Hi

On Do, 2017-08-10 at 08:30 -0700, Erick Erickson wrote:
> I have to confess that I know very little about the mechanics of LTR, but
> I can talk a little bit about compression.
> 
> When a stored values is retrieved for a document it is read from the
> *.fdt file which is a compressed, verbatim copy of the field. DocValues
> can bypass this stored data and read directly from the DV format.
> There's a discussion of useDocValuesAsStored in solr/CHANGES.txt.
> 
> The restriction of docValues is that they can only be used for
> primitive types, numerics, strings and the like, specifically _not_
> fields with class="solr.TextField".
> 
> WARNING: I have no real clue whether LTR is built to leverage
> docValues fields. If you add docValues="true" to the relevant
> fields you'll have to re-index completely. In fact I'd use a new
> collection.
> 
> And don't be put off by the fact that the index size on disk will grow
> on disk if you add docValues, the memory is MMapped to OS
> disk space and will actually _reduce_ your JVM requirements.

Yes, DocValues are definitely on our list of things to test.


Regards,

Sebastian


-- 
Sebastian Klemke
Senior Software Engineer
  
ResearchGate GmbH
Invalidenstr. 115, 10115 Berlin, Germany
  
www.researchgate.net
  
Registered Seat: Hannover, HR B 202837
Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID: DE258434568
A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754, San 
Francisco, CA 94107



solr commit is taking time

2017-08-11 Thread Midas A
Hi all

our solr commit is taking time

10.20.73.92 - - [11/Aug/2017:15:44:00 +0530] "POST
/solr/##/update?wt=javabin&version=2 HTTP/1.1" 200 - 12594 What should
i check ?


Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

2017-08-11 Thread SOLR4189
Yes, only because I'm seeing different results. 

For example, changing *WordDelimiterFilterFactory *to
*WordDelimiterGraphFilterFactory * can change order of docs? (
http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html

 
)

For building index I tried 2 ways: 1) Dataimport from SOLR-4 to SOLR-6 and
2) IndexUpgraderTool
And in both ways order of docs is different.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021p4350172.html
Sent from the Solr - User mailing list archive at Nabble.com.