Re: SolrCloud 4.8.0 upgrade

2015-04-20 Thread Vincenzo D'Amore
Hi, 

I'm seriously thinking to upgrade, but how? Could I upgrade one instance at 
time or should I stop all the instances and upgrade and restart everything?

Ciao,
Vincenzo

--
Vincenzo D'Amore
skype: free.dev
mobile: +39 349 8513251

> On Apr 18, 2015, at 2:13 AM, Vincenzo D'Amore  wrote:
> 
> Great!! Thank you very much.
> 
>> On Fri, Apr 17, 2015 at 7:36 PM, Erick Erickson  
>> wrote:
>> Solr/Lucene are supposed to _always_ read one major version back. Thus
>> your 4.10 should be able to read indexes produced all the way back to
>> (and including) 3.x. Sometimes "experimental" formats are excepted.
>> 
>> In your case you should be fine since you're upgrading from 4.8..
>> 
>> As always, though, I'd recommend copying your indexes someplace just
>> to be paranoid before upgrading.
>> 
>> Best,
>> Erick
>> 
>> On Fri, Apr 17, 2015 at 10:28 AM, Vincenzo D'Amore  
>> wrote:
>> > Thanks for your answers, I looked at changes and we don't use
>> > DocValuesFormat.
>> > The question is, if I upgrade the SolrCloud version to 4.10, should I
>> > reload entirely all documents?
>> > Is there a binary compatibility between these two versions reading the
>> > solar home?
>> >
>> > On Fri, Apr 17, 2015 at 7:04 PM, Erick Erickson 
>> > wrote:
>> >
>> >> Look at CHANGES.txt for both Lucene and Solr, there's always an
>> >> "upgrading" section for each release.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen 
>> >> wrote:
>> >> > Vincenzo D'Amore  wrote:
>> >> >> I have a SolrCloud cluster with 3 server, I would like to use
>> >> stats.facet,
>> >> >> but this feature is available only if I upgrade to 4.10.
>> >> >
>> >> >> May I simply redeploy new solr cloud version in tomcat or should reload
>> >> all
>> >> >> the documents?
>> >> >> There are other drawbacks?
>> >> >
>> >> > Support for the Disk-format for DocValues was removed after 4.8, so you
>> >> should check if you use that: DocValuesFormat="Disk" for the field in the
>> >> schema, if I remember correctly.
>> >> >
>> >> > - Toke Eskildsen
>> >>
>> >
>> >
>> >
>> > --
>> > Vincenzo D'Amore
>> > email: v.dam...@gmail.com
>> > skype: free.dev
>> > mobile: +39 349 8513251
> 
> 
> 
> -- 
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


SolrCloud 4.8.0 - load average high during indexing

2015-04-20 Thread Vincenzo D'Amore
Hi all, 

I'm experiencing a very high load average (6/7) during documents indexing.
Some time SolrCloud response can be returned in more than 5 seconds.
My SolrCloud cluster have 3 nodes, and my collection have 3 shards and 6 
replicas.
I suppose that all this load is due to the replica syncing, so to lower the 
load I would reduce the number of replicas. What do you think about?

Ciao,
Vincenzo

--
Vincenzo D'Amore
skype: free.dev
mobile: +39 349 8513251

Re: solr 4.8.0 update synonyms in zookeeper splitted files

2015-04-20 Thread Vincenzo D'Amore
Hi Shawn, 

Thanks again for the answer. I'm not using implicit document routing. I have 
restarted all the nodes (tomcat stop/start), but after a couple of days or even 
less, we have again random results (sometimes).
If I have different replicas of my index with different settings, how can I 
restore definitely the situation? Should I remove and create again all the 
replicas?
I was even thinking, this problem maybe is due to a bug, I could upgrade to 
4.10. But how? Could I stop a node, upgrade solr and start it again?

> 
> If numFound is changing when you run the same query multiple times, there is 
> one of two things happening:
> 
> 1) You have documents with the same uniqueKey value in more than one shard.  
> This can happen if you are using implicit (manual) document routing for 
> multiple shards.
> 
> 2) Different replicas of your index have different settings (such as the 
> synonyms), or different documents in the index.Different settings can happen 
> if you update the config and then only reload/restart some of your cores.  
> Different documents in different replicas is usually an indication of a bug, 
> or something going very wrong, such as OutOfMemory errors.
> 
> Thanks,
> Shawn
> 


Re: Differentiating user search term in Solr

2015-04-20 Thread Walter Underwood
I’ve been wanting a “free text” query parser for a while. We could build some 
cool stuff on that: auto-phrasing, entity extraction and weighting, CJK 
tokenization, …

For reference, here are some real-world user queries I have needed to deal 
with. These have exactly matched content.

* +/-
* .hack//Roots
* p=mv

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Apr 20, 2015, at 5:52 PM, Steven White  wrote:

> Hi Erick,
> 
> I think you missed my point.  My request is, Solr support a new URL
> parameter.  If this parameter is set, than EVERYTHING in q is treated as
> raw text (i.e.: Solr will do the escaping vs. the client).
> 
> Thanks
> 
> Steve
> 
> On Mon, Apr 20, 2015 at 1:08 PM, Erick Erickson 
> wrote:
> 
>> How does that address the example query I gave?
>> 
>> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
>> followed by a colon (:)").
>> 
>> bq: "Solr will treat everything in the search string by first passing
>> it to ClientUtils.escapeQueryChars()."
>> 
>> would incorrectly escape the colons after field1, field, field2 and
>> correctly escape the colon after d and in parens. And parens are a
>> reserved character too, so it would incorrectly escape _all_ the
>> parens except the ones surrounding the colon.
>> 
>> The list of reserved characters is pretty unchanging, so I don't think
>> it's too much to ask the app layer, which knows (at least it better
>> know) which bits of the query were user entered, what rules apply as
>> to whether the user can enter field-qualified searches etc. Only armed
>> with that knowledge can the right thing be done, and Solr has no
>> knowledge of those rules.
>> 
>> If you insist that the client shouldn't deal with that, you could
>> always write a custom component that enforces the rules that are
>> particular to your setup. For instance, you may have a rule that you
>> can never field-qualify any term, in which case escaping on the Solr
>> side would work in _your_ situation. But the general case just doesn't
>> fit into the "escape on the Solr side" paradigm.
>> 
>> Best,
>> Erick
>> 
>> 
>> On Mon, Apr 20, 2015 at 9:55 AM, Steven White 
>> wrote:
>>> Hi Erick,
>>> 
>>> I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
>>> Unfortunately I cannot use it because it means I have to import Solr
>>> classes with my client application.  I want to avoid that and create a
>>> lose coupling between my application and Solr (just rely on REST).
>>> 
>>> My suggestion is to add a new URL parameter to Solr, such as
>>> "q.ignoreOperators=[true | false]" (or some other name).  If this
>> parameter
>>> is set to "false" or is missing, than the current behavior takes effect,
>> if
>>> it is set to "true" than Solr will treat everything in the search string
>> by
>>> first passing it to ClientUtils.escapeQueryChars().  This way, the client
>>> application doesn't have to: a) be tightly coupled with Solr (require to
>>> link with Solr JARs to use escapeQueryChars), and b) keep up with Solr
>> when
>>> new operators are added.
>>> 
>>> What do you think?
>>> 
>>> Steve
>>> 
>>> On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson <
>> erickerick...@gmail.com>
>>> wrote:
>>> 
 Steve:
 
 In short, no. There's no good way for Solr to solve this problem in
 the _general_ case. Well, actually we could create parsers with rules
 like "if the colon is inside a paren, escape it). Which would
 completely break someone who wants to form queries like
 
 q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
 followed by a colon (:)").
 
 You say: " A better solution would be to have Solr support a new
 parameter that I can pass to Solr as part of the URL."
 
 How would Solr know _which_ parts of the URL to escape in the case
>> above?
 
 You have to do this at the app layer as that's the only place that has
 a clue what the peculiarities of the situation are.
 
 But if you're using SolrJ in your app layer, you can use
 ClientUtils.escapeQueryChars() for user-entered data to do the
 escaping without you having to maintain a separate list.
 
 Best,
 Erick
 
 On Mon, Apr 20, 2015 at 8:39 AM, Steven White 
 wrote:
> Hi Shawn,
> 
> If the user types "title:(Apache: Solr Notes)" (without quotes) than I
 want
> Solr to treat the whole string as raw text string as if I escaped ":",
 "("
> and ")" and any other reserved Solr keywords / tokens.  Using dismax
>> it
> worked for the ":" case, but I still get SyntaxError if I pass it the
> following "title:(Apache: Solr Notes) AND" (here is the full URL):
> 
> 
> 
 
>> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
> 
> So far, the only solution I can find is for my application to e

Re: Differentiating user search term in Solr

2015-04-20 Thread Steven White
Hi Erick,

I think you missed my point.  My request is, Solr support a new URL
parameter.  If this parameter is set, than EVERYTHING in q is treated as
raw text (i.e.: Solr will do the escaping vs. the client).

Thanks

Steve

On Mon, Apr 20, 2015 at 1:08 PM, Erick Erickson 
wrote:

> How does that address the example query I gave?
>
> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> followed by a colon (:)").
>
> bq: "Solr will treat everything in the search string by first passing
> it to ClientUtils.escapeQueryChars()."
>
> would incorrectly escape the colons after field1, field, field2 and
> correctly escape the colon after d and in parens. And parens are a
> reserved character too, so it would incorrectly escape _all_ the
> parens except the ones surrounding the colon.
>
> The list of reserved characters is pretty unchanging, so I don't think
> it's too much to ask the app layer, which knows (at least it better
> know) which bits of the query were user entered, what rules apply as
> to whether the user can enter field-qualified searches etc. Only armed
> with that knowledge can the right thing be done, and Solr has no
> knowledge of those rules.
>
> If you insist that the client shouldn't deal with that, you could
> always write a custom component that enforces the rules that are
> particular to your setup. For instance, you may have a rule that you
> can never field-qualify any term, in which case escaping on the Solr
> side would work in _your_ situation. But the general case just doesn't
> fit into the "escape on the Solr side" paradigm.
>
> Best,
> Erick
>
>
> On Mon, Apr 20, 2015 at 9:55 AM, Steven White 
> wrote:
> > Hi Erick,
> >
> > I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
> > Unfortunately I cannot use it because it means I have to import Solr
> > classes with my client application.  I want to avoid that and create a
> > lose coupling between my application and Solr (just rely on REST).
> >
> > My suggestion is to add a new URL parameter to Solr, such as
> > "q.ignoreOperators=[true | false]" (or some other name).  If this
> parameter
> > is set to "false" or is missing, than the current behavior takes effect,
> if
> > it is set to "true" than Solr will treat everything in the search string
> by
> > first passing it to ClientUtils.escapeQueryChars().  This way, the client
> > application doesn't have to: a) be tightly coupled with Solr (require to
> > link with Solr JARs to use escapeQueryChars), and b) keep up with Solr
> when
> > new operators are added.
> >
> > What do you think?
> >
> > Steve
> >
> > On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Steve:
> >>
> >> In short, no. There's no good way for Solr to solve this problem in
> >> the _general_ case. Well, actually we could create parsers with rules
> >> like "if the colon is inside a paren, escape it). Which would
> >> completely break someone who wants to form queries like
> >>
> >> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> >> followed by a colon (:)").
> >>
> >> You say: " A better solution would be to have Solr support a new
> >> parameter that I can pass to Solr as part of the URL."
> >>
> >> How would Solr know _which_ parts of the URL to escape in the case
> above?
> >>
> >> You have to do this at the app layer as that's the only place that has
> >> a clue what the peculiarities of the situation are.
> >>
> >> But if you're using SolrJ in your app layer, you can use
> >> ClientUtils.escapeQueryChars() for user-entered data to do the
> >> escaping without you having to maintain a separate list.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Apr 20, 2015 at 8:39 AM, Steven White 
> >> wrote:
> >> > Hi Shawn,
> >> >
> >> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I
> >> want
> >> > Solr to treat the whole string as raw text string as if I escaped ":",
> >> "("
> >> > and ")" and any other reserved Solr keywords / tokens.  Using dismax
> it
> >> > worked for the ":" case, but I still get SyntaxError if I pass it the
> >> > following "title:(Apache: Solr Notes) AND" (here is the full URL):
> >> >
> >> >
> >> >
> >>
> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
> >> >
> >> > So far, the only solution I can find is for my application to escape
> all
> >> > Solr operators before sending the string to Solr.  This is fine, but
> it
> >> > means my application will have to adopt to Solr's reserved operators
> as
> >> > Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that
> to
> >> my
> >> > applications escape list).  A better solution would be to have Solr
> >> support
> >> > a new parameter that I can pass to Solr as part of the URL.
> >> > This parameter will tell Solr to do the escaping for me or not
> (missing
> >> > means the same as don't do the esc

Re: Solr Index data lost

2015-04-20 Thread Erick Erickson
Did you commit before you unplugged the drive? Were you able to see
data in the admin UI _before_ you unplugged the drive?

Best,
Erick

On Mon, Apr 20, 2015 at 3:58 PM, Vijay Bhoomireddy
 wrote:
> Shawn,
>
> I haven’t changed any DirectoryFactory setting in the solrconfig.xml  as I am 
> using in a local setup and using the default configurations.
>
> Device has been unmounted successfully (confirmed through windows message in 
> the lower right corner). I am using Solr-4.10.2. I simply run a Ctrl-C 
> command in the windows Command prompt to stop Solr, in the same window where 
> it was started earlier.
>
> Please correct me if something has been done not in the correct fashion.
>
> Thanks & Regards
> Vijay
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: 20 April 2015 22:34
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Index data lost
>
> On 4/20/2015 2:55 PM, Vijay Bhoomireddy wrote:
>> I have configured Solr example server on a pen drive. I have indexed
>> some content. The data directory was under
>> example/solr/collection1/data which is the default one. After
>> indexing, I stopped the Solr server and unplugged the pen drive and
>> reconnected the same. Now, when I navigate to the SolrAdmin UI, I cannot see 
>> any data in the index.
>>
>> Any pointers please? In this case, though the installation was on a
>> pen-drive, I think it shouldn't matter to Solr on where the data
>> directory is. So I believe this data folder wiping has happened due to
>> server shutdown. Will the data folder be wiped off if the server is
>> restarted or stopped? How to save the index data between machine
>> failures or planned maintenances?
>
> If you are using the default Directory implementation in your solrconfig.xml 
> (NRTCachingDirectoryFactory for 4.x and later, MMapDirectoryFactory for newer 
> 3.x versions), then everything should be persisted correctly.
>
> Did you properly unmount/eject the removable volume before you unplugged it?  
> On a non-windows OS, you might also want to run the 'sync'
> command.  If you didn't do the unmount/eject, you can't be sure that the 
> filesystem was properly closed and fully up-to-date on the device.
>
> What version of Solr did you use and how exactly did you start Solr and the 
> example?  How did you stop Solr?
>
> Thanks,
> Shawn
>
>
>
> --
> The contents of this e-mail are confidential and for the exclusive use of
> the intended recipient. If you receive this e-mail in error please delete
> it from your system immediately and notify us either by e-mail or
> telephone. You should not copy, forward or otherwise disclose the content
> of the e-mail. The views expressed in this communication may not
> necessarily be the view held by WHISHWORKS.


Re: Solr Index data lost

2015-04-20 Thread Shawn Heisey
On 4/20/2015 4:58 PM, Vijay Bhoomireddy wrote:
> I haven’t changed any DirectoryFactory setting in the solrconfig.xml  as I am 
> using in a local setup and using the default configurations.
>
> Device has been unmounted successfully (confirmed through windows message in 
> the lower right corner). I am using Solr-4.10.2. I simply run a Ctrl-C 
> command in the windows Command prompt to stop Solr, in the same window where 
> it was started earlier.

You didn't say what command you used to start Solr, but based on using
Ctrl-C to stop it, I am guessing it was "java -jar start.jar" rather
than the bin/solr script that was new with 4.10.  The Ctrl-C stop method
should result in a graceful shutdown of Solr.

There should be no problems with persistence based on all this.  Any
data that was in the index before you stopped Solr should be there after
you start it back up.  I would not expect to see any problems with a pen
drive other than speed, even if it's a fat32 filesystem rather than NTFS.

Does it behave differently if you move it to a local hard disk?

Thanks,
Shawn



RE: Solr Index data lost

2015-04-20 Thread Vijay Bhoomireddy
Shawn,

I haven’t changed any DirectoryFactory setting in the solrconfig.xml  as I am 
using in a local setup and using the default configurations.

Device has been unmounted successfully (confirmed through windows message in 
the lower right corner). I am using Solr-4.10.2. I simply run a Ctrl-C command 
in the windows Command prompt to stop Solr, in the same window where it was 
started earlier.

Please correct me if something has been done not in the correct fashion.

Thanks & Regards
Vijay

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 20 April 2015 22:34
To: solr-user@lucene.apache.org
Subject: Re: Solr Index data lost

On 4/20/2015 2:55 PM, Vijay Bhoomireddy wrote:
> I have configured Solr example server on a pen drive. I have indexed 
> some content. The data directory was under 
> example/solr/collection1/data which is the default one. After 
> indexing, I stopped the Solr server and unplugged the pen drive and 
> reconnected the same. Now, when I navigate to the SolrAdmin UI, I cannot see 
> any data in the index.
>
> Any pointers please? In this case, though the installation was on a 
> pen-drive, I think it shouldn't matter to Solr on where the data 
> directory is. So I believe this data folder wiping has happened due to 
> server shutdown. Will the data folder be wiped off if the server is 
> restarted or stopped? How to save the index data between machine 
> failures or planned maintenances?

If you are using the default Directory implementation in your solrconfig.xml 
(NRTCachingDirectoryFactory for 4.x and later, MMapDirectoryFactory for newer 
3.x versions), then everything should be persisted correctly.

Did you properly unmount/eject the removable volume before you unplugged it?  
On a non-windows OS, you might also want to run the 'sync'
command.  If you didn't do the unmount/eject, you can't be sure that the 
filesystem was properly closed and fully up-to-date on the device.

What version of Solr did you use and how exactly did you start Solr and the 
example?  How did you stop Solr?

Thanks,
Shawn



-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Multiple index.timestamp directories using up disk space

2015-04-20 Thread Rishi Easwaran
Hi All,

We are seeing this problem with solr 4.6 and solr 4.10.3. 
For some reason, solr cloud tries to recover and creates a new index directory 
- (ex:index.20150420181214550), while keeping the older index as is. This 
creates an issues where the disk space fills up and the shard never ends up 
recovering.
Usually this requires a manual intervention of  bouncing the instance and 
wiping the disk clean to allow for a clean recovery. 

Any ideas on how to prevent solr from creating multiple copies of index 
directory.

Thanks,
Rishi.


Re: Solr Index data lost

2015-04-20 Thread Shawn Heisey
On 4/20/2015 2:55 PM, Vijay Bhoomireddy wrote:
> I have configured Solr example server on a pen drive. I have indexed some
> content. The data directory was under example/solr/collection1/data which is
> the default one. After indexing, I stopped the Solr server and unplugged the
> pen drive and reconnected the same. Now, when I navigate to the SolrAdmin
> UI, I cannot see any data in the index.
>
> Any pointers please? In this case, though the installation was on a
> pen-drive, I think it shouldn't matter to Solr on where the data directory
> is. So I believe this data folder wiping has happened due to server
> shutdown. Will the data folder be wiped off if the server is restarted or
> stopped? How to save the index data between machine failures or planned
> maintenances? 

If you are using the default Directory implementation in your
solrconfig.xml (NRTCachingDirectoryFactory for 4.x and later,
MMapDirectoryFactory for newer 3.x versions), then everything should be
persisted correctly.

Did you properly unmount/eject the removable volume before you unplugged
it?  On a non-windows OS, you might also want to run the 'sync'
command.  If you didn't do the unmount/eject, you can't be sure that the
filesystem was properly closed and fully up-to-date on the device.

What version of Solr did you use and how exactly did you start Solr and
the example?  How did you stop Solr?

Thanks,
Shawn



Re: generate uuid/ id for table which do not have any primary key

2015-04-20 Thread Vishal Swaroop
Thanks... Yes that is option we will go forward with.
On Apr 20, 2015 10:52 AM, "Kaushik"  wrote:

> Have you tried select  as id, name, age ?
>
> On Thu, Apr 16, 2015 at 3:34 PM, Vishal Swaroop 
> wrote:
>
> > Just wondering if there is a way to generate uuid/ id in data-config
> > without using combination of fields in query...
> >
> > data-config.xml
> > 
> > 
> >  >   batchSize="2000"
> >   name="test"
> >   type="JdbcDataSource"
> >   driver="oracle.jdbc.OracleDriver"
> >   url="jdbc:oracle:thin:@ldap:"
> >   user="myUser"
> >   password="pwd"/>
> > 
> >  >   docRoot="true"
> >   dataSource="test"
> >   query="select name, age from test_user">
> > 
> > 
> > 
> >
> > On Thu, Apr 16, 2015 at 3:18 PM, Vishal Swaroop 
> > wrote:
> >
> > > Thanks Kaushik & Erick..
> > >
> > > Though I can populate uuid by using combination of fields but need to
> > > change the type to "string" else it throws "Invalid UUID String"
> > >  > > required="true" multiValued="false"/>
> > >
> > > a) I will have ~80 millions records and wondering if performance might
> be
> > > issue
> > > b) So, during update I can still use combination of fields i.e. uuid ?
> > >
> > > On Thu, Apr 16, 2015 at 2:44 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > >> This seems relevant:
> > >>
> > >>
> > >>
> >
> http://stackoverflow.com/questions/16914324/solr-4-missing-required-field-uuid
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Thu, Apr 16, 2015 at 11:38 AM, Kaushik 
> > wrote:
> > >> > You seem to have defined the field, but not populating it in the
> > query.
> > >> Use
> > >> > a combination of fields to come up with a unique id that can be
> > >> assigned to
> > >> > uuid. Does that make sense?
> > >> >
> > >> > Kaushik
> > >> >
> > >> > On Thu, Apr 16, 2015 at 2:25 PM, Vishal Swaroop <
> vishal@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> >> How to generate uuid/ id (maybe in data-config.xml...) for table
> > which
> > >> do
> > >> >> not have any primary key.
> > >> >>
> > >> >> Scenario :
> > >> >> Using DIH I need to import data from database but table does not
> have
> > >> any
> > >> >> primary key
> > >> >> I do have uuid defined in schema.xml and is
> > >> >>  > >> required="true"
> > >> >> multiValued="false"/>
> > >> >> uuid
> > >> >>
> > >> >> data-config.xml
> > >> >> 
> > >> >> 
> > >> >>  > >> >>   batchSize="2000"
> > >> >>   name="test"
> > >> >>   type="JdbcDataSource"
> > >> >>   driver="oracle.jdbc.OracleDriver"
> > >> >>   url="jdbc:oracle:thin:@ldap:"
> > >> >>   user="myUser"
> > >> >>   password="pwd"/>
> > >> >> 
> > >> >>  > >> >>   docRoot="true"
> > >> >>   dataSource="test"
> > >> >>   query="select name, age from test_user">
> > >> >> 
> > >> >> 
> > >> >> 
> > >> >>
> > >> >> Error : Document is missing mandatory uniqueKey field: uuid
> > >> >>
> > >>
> > >
> > >
> >
>


Solr Index data lost

2015-04-20 Thread Vijay Bhoomireddy
Hi,

 

I have configured Solr example server on a pen drive. I have indexed some
content. The data directory was under example/solr/collection1/data which is
the default one. After indexing, I stopped the Solr server and unplugged the
pen drive and reconnected the same. Now, when I navigate to the SolrAdmin
UI, I cannot see any data in the index.

 

Any pointers please? In this case, though the installation was on a
pen-drive, I think it shouldn't matter to Solr on where the data directory
is. So I believe this data folder wiping has happened due to server
shutdown. Will the data folder be wiped off if the server is restarted or
stopped? How to save the index data between machine failures or planned
maintenances? 

 

Thanks & Regards

Vijay


-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Has anyone seen this error?

2015-04-20 Thread vsilgalis
I fixed this issue by reloading the core on the leader for the shard.

Still curious how this happened, any help would be greatly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Has-anyone-seen-this-error-tp4200975p4201067.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-20 Thread Rishi Easwaran
So is there anything that can be done from a tuning perspective, to recover a 
shard that is 75%-90% full, other that get rid of the index and rebuild the 
data?
 Also to prevent this issue from re-occurring, looks like we need make our 
system aggressive with segment merges using lower merge factor  

 
Thanks,
Rishi.

 

-Original Message-
From: Shawn Heisey 
To: solr-user 
Sent: Mon, Apr 20, 2015 11:25 am
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


On 4/20/2015 8:44 AM, Rishi Easwaran wrote:
> Yeah I noticed that. Looks like
optimize won't work since on some disks we are already pretty full.
> Any
thoughts on increasing/decreasing 10  or
ConcurrentMergeScheduler to make solr do merges faster.

You don't have to do
an optimize to need 2x disk space.  Even normal
merging, if it happens just
right, can require the same disk space as a
full optimize.  Normal Solr
operation requires that you have enough
space for your index to reach at least
double size on occasion.

Higher merge factors are better for indexing speed,
because merging
happens less frequently.  Lower merge factors are better for
query
speed, at least after the merging finishes, because merging happens
more
frequently and there are fewer total segments at any given moment.

During a merge, there is so much I/O that query speed is often
negatively
affected.

Thanks,
Shawn


 


Re: Is it possible to facet on the results of a custom solr function?

2015-04-20 Thread Motulewicz, Michael
Solved my own problem.

Using multiple function range query parsers works fine against my custom 
function

&facet.query={!frange l=1 u=1} MyCustomSolrQuery(param1,param2, param3)
&facet.query={!frange l=2 u=2} MyCustomSolrQuery(param1,param2, param3)
Etc…

Gives me the counts for 1 then 2 etc

Not sure if there’s a better way, but this works



From: , Michael Motulewicz 
mailto:michael.motulew...@healthsparq.com>>
Reply-To: "solr-user@lucene.apache.org" 
mailto:solr-user@lucene.apache.org>>
Date: Monday, April 13, 2015 at 11:40 AM
To: "solr-user@lucene.apache.org" 
mailto:solr-user@lucene.apache.org>>
Subject: Is it possible to facet on the results of a custom solr function?

Hi,

  I’m attempting to facet on the results of a custom solr function. I’ve been 
trying all kinds of combinations that I think would work, but keep getting 
errors. I’m starting to wonder if it is possible.

   I’m using Solr 4.0 and here is how I am calling:
&facet.query={!func}myCustomSolrQuery(param1, param2, param3)
   (My function returns an array of results like {“M:3”, “D:4”})

   This returns the same number as the total results.

  I’ve tried adding to the end of the query to break out the results like this:
&facet.query={!func}myCustomSolrQuery(param1, param2, param3) : (“M:3”)
  No matter what I put, I get parsing exceptions

Thanks for any help!
Mike


Ensure a sustainable future - only print when necessary.
IMPORTANT NOTICE: This communication, including any attachment, contains
information that may be confidential or privileged, and is intended solely for
the entity or individual to whom it is addressed.  If you are not the intended
recipient, you should delete this message and are hereby notified that any
disclosure, copying, or distribution of this message is strictly prohibited.
Nothing in this email, including any attachment, is intended to be a legally
binding signature.


RE: search by person name

2015-04-20 Thread Pedro Figueiredo
Hi Steve,

Thanks, it works! 
I will analyse in detail your solution because I never used the syntax using ().

Best regards,

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150
 

Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"
 


-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: 20 de abril de 2015 17:39
To: solr-user@lucene.apache.org
Subject: Re: search by person name

Why not just use q=name:(ana jose) ?  Than missing words or words order won't 
matter.  No?

Steve

On Mon, Apr 20, 2015 at 12:26 PM, Erick Erickson 
wrote:

> First, a little patience on your part please, we're all volunteers here.
>
> Second, what have you done to try to analyze the problem? Have you 
> tried adding &debgu=query to to your URL? Looked at the analysis page?
> Anything else?
>
> You might review: http://wiki.apache.org/solr/UsingMailingLists
>
> My guess (and Rafal provided you a strong clue if my guess is right) 
> is that by enclosing "ana jose" in quotes you've created a phrase 
> query that requires the two words to be right next to each other and 
> they have "maria" between them. Using "slop", i.e. "ana jose"~2 should 
> find the doc if I'm correct.
>
> Best,
> Erick
>
> On Mon, Apr 20, 2015 at 7:41 AM, Pedro Figueiredo 
>  wrote:
> > Any help please?
> >
> > PF
> >
> > -Original Message-
> > From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com]
> > Sent: 20 de abril de 2015 14:19
> > To: solr-user@lucene.apache.org
> > Subject: RE: search by person name
> >
> > yes
> >
> > Pedro Figueiredo
> > Senior Engineer
> >
> > pjlfigueir...@criticalsoftware.com
> > M. 934058150
> >
> >
> > Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, 
> > Portugal T. +351
> 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
> >
> > PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI®
> LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"
> >
> >
> >
> > -Original Message-
> > From: Rafal Kuc [mailto:ra...@alud.com.pl]
> > Sent: 20 de abril de 2015 14:10
> > To: solr-user@lucene.apache.org
> > Subject: Re: search by person name
> >
> > Hello,
> >
> > How does you query look like? Do you use phrase query, like 
> > q=name:"ana
> jose" ?
> >
> > ---
> > Regards,
> > Rafał Kuć
> >
> >
> >
> >
> >> Wiadomość napisana przez Pedro Figueiredo <
> pjlfigueir...@criticalsoftware.com> w dniu 20 kwi 2015, o godz. 15:06:
> >>
> >> Hi all,
> >>
> >> Can anyone advise the tokens and filters to use, for the most 
> >> common
> way to search by people’s names.
> >> The basics requirements are:
> >>
> >> For field name – “Ana Maria José”
> >> The following search’s should return the example:
> >> 1.   “Ana”
> >> 2.   “Maria”
> >> 3.   “Jose”
> >> 4.   “ana maria”
> >> 5.   “ana jose”
> >>
> >> With the following configuration I’m not able to satisfy all the
> searches (namely the last one….):
> >> 
> >> 
> >> 
> >>
> >> Thanks in advanced,
> >>
> >> Pedro Figueiredo
> >> Senior Engineer
> >>
> >> pjlfigueir...@criticalsoftware.com
> >> 
> >> M. 934058150
> >>
> >>
> >> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, 
> >> Portugal T. +351 229 446 927 | F. +351 229 446 929 
> >> www.criticalsoftware.com 
> >>
> >> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A 
> >> CMMI® LEVEL 5 RATED COMPANY  CMMI® is 
> >> registered
> in the USPTO by CMU "
> >
> >
>



Re: Differentiating user search term in Solr

2015-04-20 Thread Erick Erickson
How does that address the example query I gave?

q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
followed by a colon (:)").

bq: "Solr will treat everything in the search string by first passing
it to ClientUtils.escapeQueryChars()."

would incorrectly escape the colons after field1, field, field2 and
correctly escape the colon after d and in parens. And parens are a
reserved character too, so it would incorrectly escape _all_ the
parens except the ones surrounding the colon.

The list of reserved characters is pretty unchanging, so I don't think
it's too much to ask the app layer, which knows (at least it better
know) which bits of the query were user entered, what rules apply as
to whether the user can enter field-qualified searches etc. Only armed
with that knowledge can the right thing be done, and Solr has no
knowledge of those rules.

If you insist that the client shouldn't deal with that, you could
always write a custom component that enforces the rules that are
particular to your setup. For instance, you may have a rule that you
can never field-qualify any term, in which case escaping on the Solr
side would work in _your_ situation. But the general case just doesn't
fit into the "escape on the Solr side" paradigm.

Best,
Erick


On Mon, Apr 20, 2015 at 9:55 AM, Steven White  wrote:
> Hi Erick,
>
> I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
> Unfortunately I cannot use it because it means I have to import Solr
> classes with my client application.  I want to avoid that and create a
> lose coupling between my application and Solr (just rely on REST).
>
> My suggestion is to add a new URL parameter to Solr, such as
> "q.ignoreOperators=[true | false]" (or some other name).  If this parameter
> is set to "false" or is missing, than the current behavior takes effect, if
> it is set to "true" than Solr will treat everything in the search string by
> first passing it to ClientUtils.escapeQueryChars().  This way, the client
> application doesn't have to: a) be tightly coupled with Solr (require to
> link with Solr JARs to use escapeQueryChars), and b) keep up with Solr when
> new operators are added.
>
> What do you think?
>
> Steve
>
> On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson 
> wrote:
>
>> Steve:
>>
>> In short, no. There's no good way for Solr to solve this problem in
>> the _general_ case. Well, actually we could create parsers with rules
>> like "if the colon is inside a paren, escape it). Which would
>> completely break someone who wants to form queries like
>>
>> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
>> followed by a colon (:)").
>>
>> You say: " A better solution would be to have Solr support a new
>> parameter that I can pass to Solr as part of the URL."
>>
>> How would Solr know _which_ parts of the URL to escape in the case above?
>>
>> You have to do this at the app layer as that's the only place that has
>> a clue what the peculiarities of the situation are.
>>
>> But if you're using SolrJ in your app layer, you can use
>> ClientUtils.escapeQueryChars() for user-entered data to do the
>> escaping without you having to maintain a separate list.
>>
>> Best,
>> Erick
>>
>> On Mon, Apr 20, 2015 at 8:39 AM, Steven White 
>> wrote:
>> > Hi Shawn,
>> >
>> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I
>> want
>> > Solr to treat the whole string as raw text string as if I escaped ":",
>> "("
>> > and ")" and any other reserved Solr keywords / tokens.  Using dismax it
>> > worked for the ":" case, but I still get SyntaxError if I pass it the
>> > following "title:(Apache: Solr Notes) AND" (here is the full URL):
>> >
>> >
>> >
>> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
>> >
>> > So far, the only solution I can find is for my application to escape all
>> > Solr operators before sending the string to Solr.  This is fine, but it
>> > means my application will have to adopt to Solr's reserved operators as
>> > Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that to
>> my
>> > applications escape list).  A better solution would be to have Solr
>> support
>> > a new parameter that I can pass to Solr as part of the URL.
>> > This parameter will tell Solr to do the escaping for me or not (missing
>> > means the same as don't do the escaping).
>> >
>> > Thanks
>> >
>> > Steve
>> >
>> > On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey 
>> wrote:
>> >
>> >> On 4/20/2015 7:41 AM, Steven White wrote:
>> >> > In my application, a user types "Apache Solr Notes".  I take that text
>> >> and
>> >> > send it over to Solr like so:
>> >> >
>> >> >
>> >> >
>> >>
>> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
>> >> >
>> >> > And I get a hit on "Apache Solr Release Notes".  This is all good.
>> >> >
>> >> > No

RE: search by person name

2015-04-20 Thread Pedro Figueiredo
Hi Erick,

I apologize if I made the wrong impression it was not my intention.
I've tried a few extra filters and tokens and with a few extra searchs in 
google I found the proximity parameter that solved my issue.

Anyway, many thanks for your feedback and again, I apologize for any 
misunderstanding.
 
Thank you,
PF

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 20 de abril de 2015 17:26
To: solr-user@lucene.apache.org
Subject: Re: search by person name

First, a little patience on your part please, we're all volunteers here.

Second, what have you done to try to analyze the problem? Have you tried adding 
&debgu=query to to your URL? Looked at the analysis page?
Anything else?

You might review: http://wiki.apache.org/solr/UsingMailingLists

My guess (and Rafal provided you a strong clue if my guess is right) is that by 
enclosing "ana jose" in quotes you've created a phrase query that requires the 
two words to be right next to each other and they have "maria" between them. 
Using "slop", i.e. "ana jose"~2 should find the doc if I'm correct.

Best,
Erick

On Mon, Apr 20, 2015 at 7:41 AM, Pedro Figueiredo 
 wrote:
> Any help please?
>
> PF
>
> -Original Message-
> From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com]
> Sent: 20 de abril de 2015 14:19
> To: solr-user@lucene.apache.org
> Subject: RE: search by person name
>
> yes
>
> Pedro Figueiredo
> Senior Engineer
>
> pjlfigueir...@criticalsoftware.com
> M. 934058150
>
>
> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal 
> T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
>
> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
> RATED COMPANY CMMI® is registered in the USPTO by CMU"
>
>
>
> -Original Message-
> From: Rafal Kuc [mailto:ra...@alud.com.pl]
> Sent: 20 de abril de 2015 14:10
> To: solr-user@lucene.apache.org
> Subject: Re: search by person name
>
> Hello,
>
> How does you query look like? Do you use phrase query, like q=name:"ana jose" 
> ?
>
> ---
> Regards,
> Rafał Kuć
>
>
>
>
>> Wiadomość napisana przez Pedro Figueiredo 
>>  w dniu 20 kwi 2015, o godz. 15:06:
>>
>> Hi all,
>>
>> Can anyone advise the tokens and filters to use, for the most common way to 
>> search by people’s names.
>> The basics requirements are:
>>
>> For field name – “Ana Maria José”
>> The following search’s should return the example:
>> 1.   “Ana”
>> 2.   “Maria”
>> 3.   “Jose”
>> 4.   “ana maria”
>> 5.   “ana jose”
>>
>> With the following configuration I’m not able to satisfy all the searches 
>> (namely the last one….):
>> 
>> 
>> 
>>
>> Thanks in advanced,
>>
>> Pedro Figueiredo
>> Senior Engineer
>>
>> pjlfigueir...@criticalsoftware.com
>> 
>> M. 934058150
>>
>>
>> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal 
>> T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com 
>> 
>>
>> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® 
>> LEVEL 5 RATED COMPANY  CMMI® is registered in the 
>> USPTO by CMU "
>
>



Re: Has anyone seen this error?

2015-04-20 Thread vsilgalis
The leader in the cluster is what is throwing the error.

One of the stack traces:
 

However I didn't notice this one before which has a bit more info:
org.apache.solr.common.SolrException: Conflict

request:
http://:8080/solr/classic_bt/update?update.chain=savolangid&update.distrib=TOLEADER&distrib.from=http%3A%2F%2F%3A8080%2Fsolr%2Fclassic_bt%2F&wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

We are using solr 4.10.2 and sadly our logging outside of the warn/error is
a little lackluster (I'm working on fixing it but need to work through a
change approval process).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Has-anyone-seen-this-error-tp4200975p4201019.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Differentiating user search term in Solr

2015-04-20 Thread Steven White
Hi Erick,

I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
Unfortunately I cannot use it because it means I have to import Solr
classes with my client application.  I want to avoid that and create a
lose coupling between my application and Solr (just rely on REST).

My suggestion is to add a new URL parameter to Solr, such as
"q.ignoreOperators=[true | false]" (or some other name).  If this parameter
is set to "false" or is missing, than the current behavior takes effect, if
it is set to "true" than Solr will treat everything in the search string by
first passing it to ClientUtils.escapeQueryChars().  This way, the client
application doesn't have to: a) be tightly coupled with Solr (require to
link with Solr JARs to use escapeQueryChars), and b) keep up with Solr when
new operators are added.

What do you think?

Steve

On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson 
wrote:

> Steve:
>
> In short, no. There's no good way for Solr to solve this problem in
> the _general_ case. Well, actually we could create parsers with rules
> like "if the colon is inside a paren, escape it). Which would
> completely break someone who wants to form queries like
>
> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> followed by a colon (:)").
>
> You say: " A better solution would be to have Solr support a new
> parameter that I can pass to Solr as part of the URL."
>
> How would Solr know _which_ parts of the URL to escape in the case above?
>
> You have to do this at the app layer as that's the only place that has
> a clue what the peculiarities of the situation are.
>
> But if you're using SolrJ in your app layer, you can use
> ClientUtils.escapeQueryChars() for user-entered data to do the
> escaping without you having to maintain a separate list.
>
> Best,
> Erick
>
> On Mon, Apr 20, 2015 at 8:39 AM, Steven White 
> wrote:
> > Hi Shawn,
> >
> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I
> want
> > Solr to treat the whole string as raw text string as if I escaped ":",
> "("
> > and ")" and any other reserved Solr keywords / tokens.  Using dismax it
> > worked for the ":" case, but I still get SyntaxError if I pass it the
> > following "title:(Apache: Solr Notes) AND" (here is the full URL):
> >
> >
> >
> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
> >
> > So far, the only solution I can find is for my application to escape all
> > Solr operators before sending the string to Solr.  This is fine, but it
> > means my application will have to adopt to Solr's reserved operators as
> > Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that to
> my
> > applications escape list).  A better solution would be to have Solr
> support
> > a new parameter that I can pass to Solr as part of the URL.
> > This parameter will tell Solr to do the escaping for me or not (missing
> > means the same as don't do the escaping).
> >
> > Thanks
> >
> > Steve
> >
> > On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey 
> wrote:
> >
> >> On 4/20/2015 7:41 AM, Steven White wrote:
> >> > In my application, a user types "Apache Solr Notes".  I take that text
> >> and
> >> > send it over to Solr like so:
> >> >
> >> >
> >> >
> >>
> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
> >> >
> >> > And I get a hit on "Apache Solr Release Notes".  This is all good.
> >> >
> >> > Now if the same user types "Apache: Solr Notes" (notice the ":" after
> >> > "Apache") I will get a SyntaxError.  The fix is to escape ":" before I
> >> send
> >> > it to Solr.  What I want to figure out is how can I tell Solr /
> Lucene to
> >> > ignore ":" and escape it for me?  In this example, I used ":" but my
> need
> >> > is for all other operators and reserved Solr / Lucene characters.
> >>
> >> If we assume that what you did for the first query is what you will do
> >> for the second query, then this is what you would have sent:
> >>
> >> q=title:(Apache: Solr Notes)
> >>
> >> How is the parser supposed to know that only the second colon should be
> >> escaped, and not the first one?  If you escape them both (or treat the
> >> entire query string as query text), then the fact that you are searching
> >> the "title" field is lost.  The text "title" becomes an actual part of
> >> the query, and may not match, depending on what you have done with other
> >> parameters, such as the default operator.
> >>
> >> If you use the dismax parser (*NOT* the edismax parser, which parses
> >> field:value queries and boolean operator syntax just like the lucene
> >> parser), you may be able to achieve what you're after.
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
> >> https://wiki.apache.org/solr/DisMaxQParserPlugin
> >>
> >> With dismax, you would use the qf and possibly the pf parameter to

Re: search by person name

2015-04-20 Thread Yavar Husain
In this case q=name:(ana jose) will work, but suppose if it is to be
searched in full text field It might have poor recall, It will also produce
document like "San Jose is better than Santa Ana" which was not the user
intent. Erick's solution  "ana jose"~2  is capturing the intent too.

On Mon, Apr 20, 2015 at 10:09 PM, Steven White  wrote:

> Why not just use q=name:(ana jose) ?  Than missing words or words order
> won't matter.  No?
>
> Steve
>
> On Mon, Apr 20, 2015 at 12:26 PM, Erick Erickson 
> wrote:
>
> > First, a little patience on your part please, we're all volunteers here.
> >
> > Second, what have you done to try to analyze the problem? Have you
> > tried adding &debgu=query to to your URL? Looked at the analysis page?
> > Anything else?
> >
> > You might review: http://wiki.apache.org/solr/UsingMailingLists
> >
> > My guess (and Rafal provided you a strong clue if my guess is right)
> > is that by enclosing "ana jose" in quotes you've created a phrase
> > query that requires the two words to be right next to each other and
> > they have "maria" between them. Using "slop", i.e. "ana jose"~2 should
> > find the doc if I'm correct.
> >
> > Best,
> > Erick
> >
> > On Mon, Apr 20, 2015 at 7:41 AM, Pedro Figueiredo
> >  wrote:
> > > Any help please?
> > >
> > > PF
> > >
> > > -Original Message-
> > > From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com]
> > > Sent: 20 de abril de 2015 14:19
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: search by person name
> > >
> > > yes
> > >
> > > Pedro Figueiredo
> > > Senior Engineer
> > >
> > > pjlfigueir...@criticalsoftware.com
> > > M. 934058150
> > >
> > >
> > > Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
> T. +351
> > 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
> > >
> > > PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI®
> > LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"
> > >
> > >
> > >
> > > -Original Message-
> > > From: Rafal Kuc [mailto:ra...@alud.com.pl]
> > > Sent: 20 de abril de 2015 14:10
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: search by person name
> > >
> > > Hello,
> > >
> > > How does you query look like? Do you use phrase query, like q=name:"ana
> > jose" ?
> > >
> > > ---
> > > Regards,
> > > Rafał Kuć
> > >
> > >
> > >
> > >
> > >> Wiadomość napisana przez Pedro Figueiredo <
> > pjlfigueir...@criticalsoftware.com> w dniu 20 kwi 2015, o godz. 15:06:
> > >>
> > >> Hi all,
> > >>
> > >> Can anyone advise the tokens and filters to use, for the most common
> > way to search by people’s names.
> > >> The basics requirements are:
> > >>
> > >> For field name – “Ana Maria José”
> > >> The following search’s should return the example:
> > >> 1.   “Ana”
> > >> 2.   “Maria”
> > >> 3.   “Jose”
> > >> 4.   “ana maria”
> > >> 5.   “ana jose”
> > >>
> > >> With the following configuration I’m not able to satisfy all the
> > searches (namely the last one….):
> > >> 
> > >> 
> > >> 
> > >>
> > >> Thanks in advanced,
> > >>
> > >> Pedro Figueiredo
> > >> Senior Engineer
> > >>
> > >> pjlfigueir...@criticalsoftware.com
> > >> 
> > >> M. 934058150
> > >>
> > >>
> > >> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
> > >> T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
> > >> 
> > >>
> > >> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI®
> > >> LEVEL 5 RATED COMPANY  CMMI® is registered
> > in the USPTO by CMU "
> > >
> > >
> >
>


Re: Search in Solr Index

2015-04-20 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Matt and Yavar for the suggestion.

Now I have fixed the issue.

For others benefit, the issue was with defining the fields as String. Now I
changed them to text_general. Also, instead of indexing these individual
fields, I created corresponding copyFields for each of them, where dest
field is set to text.

This fixed the issue.

Thanks for your help!!


Thanks & Regards
Vijay


On 20 April 2015 at 17:11, Yavar Husain  wrote:

> There might be issues with your default search field. Suppose if you are
> searching field named "MyTestField" then give your query as
> MyTestField:Birmingham
> and see if you get any results. As Matt suggested there might be some
> issues with the way you have done tokenization/analysis etc.
>
>
>
> On Mon, Apr 20, 2015 at 9:21 PM, Matt Kuiper 
> wrote:
>
> > What type of field are you using? String?  If so try another type, like
> > text_general.
> >
> > I believe with type String the contents are stored in the index exactly
> as
> > they are inputted into the index.  So a search hit will have to match
> > exactly the full value of the field, I assume in your case "Birmingham"
> is
> > only part of the value.  With text_general and other types, the value
> will
> > be tokenized and allow for hits on parts or variants of the value.
> >
> > Matt
> >
> >
> > -Original Message-
> > From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:
> > vijaya.bhoomire...@whishworks.com]
> > Sent: Monday, April 20, 2015 9:31 AM
> > To: solr-user@lucene.apache.org
> > Subject: Search in Solr Index
> >
> > Hi,
> >
> > I am indexing some data from a Database. Data is getting indexed properly
> > and when I query in the Solr stock UI with query parameters as *.*, I
> could
> > see the documents with all the fields listed and as well the numFound
> > reflecting properly. However,  if I perform a query with a simple string
> > for example "Birmingham", numFound returns 0 with no records to be
> > displayed. There are records which are indexed that contains fields with
> > the text "Birmingham". In the schema.xml, all the fields have been
> defined
> > as indexed="true" and stored="true"
> >
> > This is happening for any search query string. What could be the reason
> > for this behavior?
> >
> >
> > Thanks & Regards
> > Vijay
> >
> > --
> > The contents of this e-mail are confidential and for the exclusive use of
> > the intended recipient. If you receive this e-mail in error please delete
> > it from your system immediately and notify us either by e-mail or
> > telephone. You should not copy, forward or otherwise disclose the content
> > of the e-mail. The views expressed in this communication may not
> > necessarily be the view held by WHISHWORKS.
> >
>

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Differentiating user search term in Solr

2015-04-20 Thread Erick Erickson
Steve:

In short, no. There's no good way for Solr to solve this problem in
the _general_ case. Well, actually we could create parsers with rules
like "if the colon is inside a paren, escape it). Which would
completely break someone who wants to form queries like

q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
followed by a colon (:)").

You say: " A better solution would be to have Solr support a new
parameter that I can pass to Solr as part of the URL."

How would Solr know _which_ parts of the URL to escape in the case above?

You have to do this at the app layer as that's the only place that has
a clue what the peculiarities of the situation are.

But if you're using SolrJ in your app layer, you can use
ClientUtils.escapeQueryChars() for user-entered data to do the
escaping without you having to maintain a separate list.

Best,
Erick

On Mon, Apr 20, 2015 at 8:39 AM, Steven White  wrote:
> Hi Shawn,
>
> If the user types "title:(Apache: Solr Notes)" (without quotes) than I want
> Solr to treat the whole string as raw text string as if I escaped ":", "("
> and ")" and any other reserved Solr keywords / tokens.  Using dismax it
> worked for the ":" case, but I still get SyntaxError if I pass it the
> following "title:(Apache: Solr Notes) AND" (here is the full URL):
>
>
> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
>
> So far, the only solution I can find is for my application to escape all
> Solr operators before sending the string to Solr.  This is fine, but it
> means my application will have to adopt to Solr's reserved operators as
> Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that to my
> applications escape list).  A better solution would be to have Solr support
> a new parameter that I can pass to Solr as part of the URL.
> This parameter will tell Solr to do the escaping for me or not (missing
> means the same as don't do the escaping).
>
> Thanks
>
> Steve
>
> On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey  wrote:
>
>> On 4/20/2015 7:41 AM, Steven White wrote:
>> > In my application, a user types "Apache Solr Notes".  I take that text
>> and
>> > send it over to Solr like so:
>> >
>> >
>> >
>> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
>> >
>> > And I get a hit on "Apache Solr Release Notes".  This is all good.
>> >
>> > Now if the same user types "Apache: Solr Notes" (notice the ":" after
>> > "Apache") I will get a SyntaxError.  The fix is to escape ":" before I
>> send
>> > it to Solr.  What I want to figure out is how can I tell Solr / Lucene to
>> > ignore ":" and escape it for me?  In this example, I used ":" but my need
>> > is for all other operators and reserved Solr / Lucene characters.
>>
>> If we assume that what you did for the first query is what you will do
>> for the second query, then this is what you would have sent:
>>
>> q=title:(Apache: Solr Notes)
>>
>> How is the parser supposed to know that only the second colon should be
>> escaped, and not the first one?  If you escape them both (or treat the
>> entire query string as query text), then the fact that you are searching
>> the "title" field is lost.  The text "title" becomes an actual part of
>> the query, and may not match, depending on what you have done with other
>> parameters, such as the default operator.
>>
>> If you use the dismax parser (*NOT* the edismax parser, which parses
>> field:value queries and boolean operator syntax just like the lucene
>> parser), you may be able to achieve what you're after.
>>
>> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
>> https://wiki.apache.org/solr/DisMaxQParserPlugin
>>
>> With dismax, you would use the qf and possibly the pf parameter to tell
>> it which fields to search and send this as the query:
>>
>> q=Apache: Solr Notes
>>
>> Thanks,
>> Shawn
>>
>>


Re: search by person name

2015-04-20 Thread Steven White
Why not just use q=name:(ana jose) ?  Than missing words or words order
won't matter.  No?

Steve

On Mon, Apr 20, 2015 at 12:26 PM, Erick Erickson 
wrote:

> First, a little patience on your part please, we're all volunteers here.
>
> Second, what have you done to try to analyze the problem? Have you
> tried adding &debgu=query to to your URL? Looked at the analysis page?
> Anything else?
>
> You might review: http://wiki.apache.org/solr/UsingMailingLists
>
> My guess (and Rafal provided you a strong clue if my guess is right)
> is that by enclosing "ana jose" in quotes you've created a phrase
> query that requires the two words to be right next to each other and
> they have "maria" between them. Using "slop", i.e. "ana jose"~2 should
> find the doc if I'm correct.
>
> Best,
> Erick
>
> On Mon, Apr 20, 2015 at 7:41 AM, Pedro Figueiredo
>  wrote:
> > Any help please?
> >
> > PF
> >
> > -Original Message-
> > From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com]
> > Sent: 20 de abril de 2015 14:19
> > To: solr-user@lucene.apache.org
> > Subject: RE: search by person name
> >
> > yes
> >
> > Pedro Figueiredo
> > Senior Engineer
> >
> > pjlfigueir...@criticalsoftware.com
> > M. 934058150
> >
> >
> > Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. 
> > +351
> 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
> >
> > PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI®
> LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"
> >
> >
> >
> > -Original Message-
> > From: Rafal Kuc [mailto:ra...@alud.com.pl]
> > Sent: 20 de abril de 2015 14:10
> > To: solr-user@lucene.apache.org
> > Subject: Re: search by person name
> >
> > Hello,
> >
> > How does you query look like? Do you use phrase query, like q=name:"ana
> jose" ?
> >
> > ---
> > Regards,
> > Rafał Kuć
> >
> >
> >
> >
> >> Wiadomość napisana przez Pedro Figueiredo <
> pjlfigueir...@criticalsoftware.com> w dniu 20 kwi 2015, o godz. 15:06:
> >>
> >> Hi all,
> >>
> >> Can anyone advise the tokens and filters to use, for the most common
> way to search by people’s names.
> >> The basics requirements are:
> >>
> >> For field name – “Ana Maria José”
> >> The following search’s should return the example:
> >> 1.   “Ana”
> >> 2.   “Maria”
> >> 3.   “Jose”
> >> 4.   “ana maria”
> >> 5.   “ana jose”
> >>
> >> With the following configuration I’m not able to satisfy all the
> searches (namely the last one….):
> >> 
> >> 
> >> 
> >>
> >> Thanks in advanced,
> >>
> >> Pedro Figueiredo
> >> Senior Engineer
> >>
> >> pjlfigueir...@criticalsoftware.com
> >> 
> >> M. 934058150
> >>
> >>
> >> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
> >> T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
> >> 
> >>
> >> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI®
> >> LEVEL 5 RATED COMPANY  CMMI® is registered
> in the USPTO by CMU "
> >
> >
>


Re: Has anyone seen this error?

2015-04-20 Thread Erick Erickson
You have to provide a lot more context here, please review:
http://wiki.apache.org/solr/UsingMailingLists. The root of the problem
is often much farther down the exception trace.

Best,
Erick

On Mon, Apr 20, 2015 at 8:16 AM, vsilgalis  wrote:
> We are getting this on a couple of nodes wondering if there is a way to
> recover the node:
> Setting up to try to start recovery on replica http:///solr/classic_bt/
> after: org.apache.solr.common.SolrException: Conflict
>
> Thanks
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Has-anyone-seen-this-error-tp4200975.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: search by person name

2015-04-20 Thread Erick Erickson
First, a little patience on your part please, we're all volunteers here.

Second, what have you done to try to analyze the problem? Have you
tried adding &debgu=query to to your URL? Looked at the analysis page?
Anything else?

You might review: http://wiki.apache.org/solr/UsingMailingLists

My guess (and Rafal provided you a strong clue if my guess is right)
is that by enclosing "ana jose" in quotes you've created a phrase
query that requires the two words to be right next to each other and
they have "maria" between them. Using "slop", i.e. "ana jose"~2 should
find the doc if I'm correct.

Best,
Erick

On Mon, Apr 20, 2015 at 7:41 AM, Pedro Figueiredo
 wrote:
> Any help please?
>
> PF
>
> -Original Message-
> From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com]
> Sent: 20 de abril de 2015 14:19
> To: solr-user@lucene.apache.org
> Subject: RE: search by person name
>
> yes
>
> Pedro Figueiredo
> Senior Engineer
>
> pjlfigueir...@criticalsoftware.com
> M. 934058150
>
>
> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
> 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
>
> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
> RATED COMPANY CMMI® is registered in the USPTO by CMU"
>
>
>
> -Original Message-
> From: Rafal Kuc [mailto:ra...@alud.com.pl]
> Sent: 20 de abril de 2015 14:10
> To: solr-user@lucene.apache.org
> Subject: Re: search by person name
>
> Hello,
>
> How does you query look like? Do you use phrase query, like q=name:"ana jose" 
> ?
>
> ---
> Regards,
> Rafał Kuć
>
>
>
>
>> Wiadomość napisana przez Pedro Figueiredo 
>>  w dniu 20 kwi 2015, o godz. 15:06:
>>
>> Hi all,
>>
>> Can anyone advise the tokens and filters to use, for the most common way to 
>> search by people’s names.
>> The basics requirements are:
>>
>> For field name – “Ana Maria José”
>> The following search’s should return the example:
>> 1.   “Ana”
>> 2.   “Maria”
>> 3.   “Jose”
>> 4.   “ana maria”
>> 5.   “ana jose”
>>
>> With the following configuration I’m not able to satisfy all the searches 
>> (namely the last one….):
>> 
>> 
>> 
>>
>> Thanks in advanced,
>>
>> Pedro Figueiredo
>> Senior Engineer
>>
>> pjlfigueir...@criticalsoftware.com
>> 
>> M. 934058150
>>
>>
>> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
>> T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
>> 
>>
>> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI®
>> LEVEL 5 RATED COMPANY  CMMI® is registered in the 
>> USPTO by CMU "
>
>


Re: Search in Solr Index

2015-04-20 Thread Yavar Husain
There might be issues with your default search field. Suppose if you are
searching field named "MyTestField" then give your query as
MyTestField:Birmingham
and see if you get any results. As Matt suggested there might be some
issues with the way you have done tokenization/analysis etc.



On Mon, Apr 20, 2015 at 9:21 PM, Matt Kuiper  wrote:

> What type of field are you using? String?  If so try another type, like
> text_general.
>
> I believe with type String the contents are stored in the index exactly as
> they are inputted into the index.  So a search hit will have to match
> exactly the full value of the field, I assume in your case "Birmingham" is
> only part of the value.  With text_general and other types, the value will
> be tokenized and allow for hits on parts or variants of the value.
>
> Matt
>
>
> -Original Message-
> From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:
> vijaya.bhoomire...@whishworks.com]
> Sent: Monday, April 20, 2015 9:31 AM
> To: solr-user@lucene.apache.org
> Subject: Search in Solr Index
>
> Hi,
>
> I am indexing some data from a Database. Data is getting indexed properly
> and when I query in the Solr stock UI with query parameters as *.*, I could
> see the documents with all the fields listed and as well the numFound
> reflecting properly. However,  if I perform a query with a simple string
> for example "Birmingham", numFound returns 0 with no records to be
> displayed. There are records which are indexed that contains fields with
> the text "Birmingham". In the schema.xml, all the fields have been defined
> as indexed="true" and stored="true"
>
> This is happening for any search query string. What could be the reason
> for this behavior?
>
>
> Thanks & Regards
> Vijay
>
> --
> The contents of this e-mail are confidential and for the exclusive use of
> the intended recipient. If you receive this e-mail in error please delete
> it from your system immediately and notify us either by e-mail or
> telephone. You should not copy, forward or otherwise disclose the content
> of the e-mail. The views expressed in this communication may not
> necessarily be the view held by WHISHWORKS.
>


RE: Search in Solr Index

2015-04-20 Thread Matt Kuiper
What type of field are you using? String?  If so try another type, like 
text_general.

I believe with type String the contents are stored in the index exactly as they 
are inputted into the index.  So a search hit will have to match exactly the 
full value of the field, I assume in your case "Birmingham" is only part of the 
value.  With text_general and other types, the value will be tokenized and 
allow for hits on parts or variants of the value.

Matt


-Original Message-
From: Vijaya Narayana Reddy Bhoomi Reddy 
[mailto:vijaya.bhoomire...@whishworks.com] 
Sent: Monday, April 20, 2015 9:31 AM
To: solr-user@lucene.apache.org
Subject: Search in Solr Index

Hi,

I am indexing some data from a Database. Data is getting indexed properly and 
when I query in the Solr stock UI with query parameters as *.*, I could see the 
documents with all the fields listed and as well the numFound reflecting 
properly. However,  if I perform a query with a simple string  for example 
"Birmingham", numFound returns 0 with no records to be displayed. There are 
records which are indexed that contains fields with the text "Birmingham". In 
the schema.xml, all the fields have been defined as indexed="true" and 
stored="true"

This is happening for any search query string. What could be the reason for 
this behavior?


Thanks & Regards
Vijay

--
The contents of this e-mail are confidential and for the exclusive use of the 
intended recipient. If you receive this e-mail in error please delete it from 
your system immediately and notify us either by e-mail or telephone. You should 
not copy, forward or otherwise disclose the content of the e-mail. The views 
expressed in this communication may not necessarily be the view held by 
WHISHWORKS.


Re: Multilevel nested level support using Solr

2015-04-20 Thread Steven White
Thanks Andy.

I have been thinking along the same line as your solution, and your
solution is what looks like I will have to do.

In summary, there is no Solr built-in way to achieve my need, I have to
construct my document and build a query to get this working.

Steve

On Mon, Apr 20, 2015 at 10:57 AM, Andrew Chillrud 
wrote:

> Don't know if this is what you are looking for, but we had a similar
> requirement. In our case each folder had a unique identifier associated
> with it.
>
> When generating the Solr input document our code populated 2 fields,
> parent_folder, and folder_hierarchy (multi-valued), and for a document in
> the root->foo->bar folder added:
>
> parent_folder:
> folder_hierarchy:
> folder_hierarchy:
> folder_hierarchy:
>
> At search time, if you wanted to restrict your search within the folder
> 'bar' we generated a filter query for either 'parent_folder: folder>' or 'folder_hierarchy:' depending on whether you
> wanted only documents directly under the 'bar' folder (your case 3), or at
> any level underneath 'bar' (your case 1).
>
> If your folders don't have unique identifiers then you could achieve
> something similar by indexing the folder paths in string fields:
>
> parent_folder:root|foo|bar
> folder_hierarchy:root|foo|bar
> folder_hierarchy:root|foo
> folder_hierarchy:root
>
> and generating a fq for either 'parent_folder:root|foo|bar' or
> 'folder_hierarchy:root|foo|bar'
>
> If you didn't want to have to generate all the permutations for the
> folder_hierarchy field before sending the document to Solr for indexing you
> should be able to do something like:
>
>positionIncrementGap="100">
> 
>   
> 
> 
>   
> 
>   
>
>  indexed="true" stored="true"  multiValued="false"/>
>stored="true"  multiValued="true"/>
>
>   
>
> In which case you could just send in the 'folder_parent' field and Solr
> would generate the folder_hierarchy field.
>
> For cases 2 and 4 you could do something similar by adding 2 additional
> fields that just index the folder names instead of the paths.
>
> - Andy -
>
> -Original Message-
> From: Steven White [mailto:swhite4...@gmail.com]
> Sent: Monday, April 20, 2015 9:49 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Multilevel nested level support using Solr
>
> Re sending to see if anyone can help.  Thanks
>
> Steve
>
> On Fri, Apr 17, 2015 at 12:14 PM, Steven White 
> wrote:
>
> > Hi folks,
> >
> > In my DB, my records are nested in a folder base hierarchy:
> >
> > 
> > 
> > record_1
> > record_2
> > 
> > record_3
> > record_4
> > 
> > record_5
> > 
> > 
> > 
> > record_6
> > record_7
> > record_8
> >
> > You got the idea.
> >
> > Is there anything in Solr that will let me preserve this structer and
> > thus when I'm searching to tell it in which level to narrow down the
> > search?  I have four search levels needs:
> >
> > 1) Be able to search inside only level: ...*
> > (and everything under Level_2 from this path).
> >
> > 2) Be able to search inside a level regardless it's path: .*
> > (no matter where  is, i want to search on all records under
> > Level_2 and everything under it's path.
> >
> > 3) Same as #1 but limit the search to within that level (nothing below
> > its level are searched).
> >
> > 4) Same as #3 but limit the search to within that level (nothing below
> > its level are searched).
> >
> > I found this:
> > https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+I
> > ndex+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
> > but it looks like it supports one level only and requires the whole
> > two levels be updated even if 1 of the doc in the nest is updated.
> >
> > Thanks
> >
> > Steve
> >
>


Re: Search in Solr Index

2015-04-20 Thread Vijaya Narayana Reddy Bhoomi Reddy
To add further, initially when I give *.*, numfound returns 14170. After
giving a search string, numFound returns 0. Now if I change the search
string again back to *.*, numFound still returns to 0.

I have to refresh the page completely to see 14170 again when *.* is given
as the search string.

Thanks & Regards
Vijay



On 20 April 2015 at 16:30, Vijaya Narayana Reddy Bhoomi Reddy <
vijaya.bhoomire...@whishworks.com> wrote:

> Hi,
>
> I am indexing some data from a Database. Data is getting indexed properly
> and when I query in the Solr stock UI with query parameters as *.*, I could
> see the documents with all the fields listed and as well the numFound
> reflecting properly. However,  if I perform a query with a simple string
>  for example "Birmingham", numFound returns 0 with no records to be
> displayed. There are records which are indexed that contains fields with
> the text "Birmingham". In the schema.xml, all the fields have been defined
> as indexed="true" and stored="true"
>
> This is happening for any search query string. What could be the reason
> for this behavior?
>
>
> Thanks & Regards
> Vijay
>
>

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Differentiating user search term in Solr

2015-04-20 Thread Steven White
Hi Shawn,

If the user types "title:(Apache: Solr Notes)" (without quotes) than I want
Solr to treat the whole string as raw text string as if I escaped ":", "("
and ")" and any other reserved Solr keywords / tokens.  Using dismax it
worked for the ":" case, but I still get SyntaxError if I pass it the
following "title:(Apache: Solr Notes) AND" (here is the full URL):


http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title

So far, the only solution I can find is for my application to escape all
Solr operators before sending the string to Solr.  This is fine, but it
means my application will have to adopt to Solr's reserved operators as
Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that to my
applications escape list).  A better solution would be to have Solr support
a new parameter that I can pass to Solr as part of the URL.
This parameter will tell Solr to do the escaping for me or not (missing
means the same as don't do the escaping).

Thanks

Steve

On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey  wrote:

> On 4/20/2015 7:41 AM, Steven White wrote:
> > In my application, a user types "Apache Solr Notes".  I take that text
> and
> > send it over to Solr like so:
> >
> >
> >
> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
> >
> > And I get a hit on "Apache Solr Release Notes".  This is all good.
> >
> > Now if the same user types "Apache: Solr Notes" (notice the ":" after
> > "Apache") I will get a SyntaxError.  The fix is to escape ":" before I
> send
> > it to Solr.  What I want to figure out is how can I tell Solr / Lucene to
> > ignore ":" and escape it for me?  In this example, I used ":" but my need
> > is for all other operators and reserved Solr / Lucene characters.
>
> If we assume that what you did for the first query is what you will do
> for the second query, then this is what you would have sent:
>
> q=title:(Apache: Solr Notes)
>
> How is the parser supposed to know that only the second colon should be
> escaped, and not the first one?  If you escape them both (or treat the
> entire query string as query text), then the fact that you are searching
> the "title" field is lost.  The text "title" becomes an actual part of
> the query, and may not match, depending on what you have done with other
> parameters, such as the default operator.
>
> If you use the dismax parser (*NOT* the edismax parser, which parses
> field:value queries and boolean operator syntax just like the lucene
> parser), you may be able to achieve what you're after.
>
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
> https://wiki.apache.org/solr/DisMaxQParserPlugin
>
> With dismax, you would use the qf and possibly the pf parameter to tell
> it which fields to search and send this as the query:
>
> q=Apache: Solr Notes
>
> Thanks,
> Shawn
>
>


Search in Solr Index

2015-04-20 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

I am indexing some data from a Database. Data is getting indexed properly
and when I query in the Solr stock UI with query parameters as *.*, I could
see the documents with all the fields listed and as well the numFound
reflecting properly. However,  if I perform a query with a simple string
 for example "Birmingham", numFound returns 0 with no records to be
displayed. There are records which are indexed that contains fields with
the text "Birmingham". In the schema.xml, all the fields have been defined
as indexed="true" and stored="true"

This is happening for any search query string. What could be the reason for
this behavior?


Thanks & Regards
Vijay

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-20 Thread Shawn Heisey
On 4/20/2015 8:44 AM, Rishi Easwaran wrote:
> Yeah I noticed that. Looks like optimize won't work since on some disks we 
> are already pretty full.
> Any thoughts on increasing/decreasing 10  or 
> ConcurrentMergeScheduler to make solr do merges faster.

You don't have to do an optimize to need 2x disk space.  Even normal
merging, if it happens just right, can require the same disk space as a
full optimize.  Normal Solr operation requires that you have enough
space for your index to reach at least double size on occasion.

Higher merge factors are better for indexing speed, because merging
happens less frequently.  Lower merge factors are better for query
speed, at least after the merging finishes, because merging happens more
frequently and there are fewer total segments at any given moment. 
During a merge, there is so much I/O that query speed is often
negatively affected.

Thanks,
Shawn



Has anyone seen this error?

2015-04-20 Thread vsilgalis
We are getting this on a couple of nodes wondering if there is a way to
recover the node:
Setting up to try to start recovery on replica http:///solr/classic_bt/
after: org.apache.solr.common.SolrException: Conflict

Thanks





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Has-anyone-seen-this-error-tp4200975.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Multilevel nested level support using Solr

2015-04-20 Thread Andrew Chillrud
Don't know if this is what you are looking for, but we had a similar 
requirement. In our case each folder had a unique identifier associated with it.

When generating the Solr input document our code populated 2 fields, 
parent_folder, and folder_hierarchy (multi-valued), and for a document in the 
root->foo->bar folder added:

parent_folder:
folder_hierarchy:
folder_hierarchy:
folder_hierarchy:

At search time, if you wanted to restrict your search within the folder 'bar' 
we generated a filter query for either 'parent_folder:' or 
'folder_hierarchy:' depending on whether you wanted only 
documents directly under the 'bar' folder (your case 3), or at any level 
underneath 'bar' (your case 1).

If your folders don't have unique identifiers then you could achieve something 
similar by indexing the folder paths in string fields:

parent_folder:root|foo|bar
folder_hierarchy:root|foo|bar
folder_hierarchy:root|foo
folder_hierarchy:root

and generating a fq for either 'parent_folder:root|foo|bar' or 
'folder_hierarchy:root|foo|bar'

If you didn't want to have to generate all the permutations for the 
folder_hierarchy field before sending the document to Solr for indexing you 
should be able to do something like:

  

  


  

  

   
  

  

In which case you could just send in the 'folder_parent' field and Solr would 
generate the folder_hierarchy field.

For cases 2 and 4 you could do something similar by adding 2 additional fields 
that just index the folder names instead of the paths.

- Andy -

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Monday, April 20, 2015 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Multilevel nested level support using Solr

Re sending to see if anyone can help.  Thanks

Steve

On Fri, Apr 17, 2015 at 12:14 PM, Steven White  wrote:

> Hi folks,
>
> In my DB, my records are nested in a folder base hierarchy:
>
> 
> 
> record_1
> record_2
> 
> record_3
> record_4
> 
> record_5
> 
> 
> 
> record_6
> record_7
> record_8
>
> You got the idea.
>
> Is there anything in Solr that will let me preserve this structer and 
> thus when I'm searching to tell it in which level to narrow down the 
> search?  I have four search levels needs:
>
> 1) Be able to search inside only level: ...* 
> (and everything under Level_2 from this path).
>
> 2) Be able to search inside a level regardless it's path: .* 
> (no matter where  is, i want to search on all records under 
> Level_2 and everything under it's path.
>
> 3) Same as #1 but limit the search to within that level (nothing below 
> its level are searched).
>
> 4) Same as #3 but limit the search to within that level (nothing below 
> its level are searched).
>
> I found this:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+I
> ndex+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
> but it looks like it supports one level only and requires the whole 
> two levels be updated even if 1 of the doc in the nest is updated.
>
> Thanks
>
> Steve
>


RE: Mutli term synonyms

2015-04-20 Thread Davis, Daniel (NIH/NLM) [C]
Handling MESH descriptor preferred terms and such is similar.   I encountered 
this during evaluation of Solr for a project here at NLM.   We decided to use 
Solr for different projects instead. I considered the following approaches:
 - use a custom tokenizer at index time that indexed all of the multiple term 
alternatives.   
 - index the data, and then have an enrichment process that queries on each 
source synonym, and generates an update to add the target synonyms.  
   Follow this with an optimize.
 - During the indexing process, but before sending the data to Solr, process 
the data to tokenize and add synonyms to another field.

Both the custom tokenizer and enrichment process share the feature that they 
use Solr's own tokenizer rather than duplicate it.   The enrichment process 
seems to me only workable in environments where you can re-index all data 
periodically, so no continuous stream of data to index that needs to be handled 
relatively quickly once it is generated.The last method of pre-processing 
the data seems the least desirable to me from a blue-sky perspective, but is 
probably the easiest to implement and the most independent of Solr.

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

-Original Message-
From: Kaushik [mailto:kaushika...@gmail.com] 
Sent: Monday, April 20, 2015 10:47 AM
To: solr-user@lucene.apache.org
Subject: Mutli term synonyms

Hello,

Reading up on synonyms it looks like there is no real solution for multi term 
synonyms. Is that right? I have a use case where I need to map one multi term 
phrase to another. i.e. Tween 20 needs to be translated to Polysorbate 40.

Any thoughts as to how this can be achieved?

Thanks,
Kaushik


Re: generate uuid/ id for table which do not have any primary key

2015-04-20 Thread Kaushik
Have you tried select  as id, name, age ?

On Thu, Apr 16, 2015 at 3:34 PM, Vishal Swaroop 
wrote:

> Just wondering if there is a way to generate uuid/ id in data-config
> without using combination of fields in query...
>
> data-config.xml
> 
> 
>batchSize="2000"
>   name="test"
>   type="JdbcDataSource"
>   driver="oracle.jdbc.OracleDriver"
>   url="jdbc:oracle:thin:@ldap:"
>   user="myUser"
>   password="pwd"/>
> 
>docRoot="true"
>   dataSource="test"
>   query="select name, age from test_user">
> 
> 
> 
>
> On Thu, Apr 16, 2015 at 3:18 PM, Vishal Swaroop 
> wrote:
>
> > Thanks Kaushik & Erick..
> >
> > Though I can populate uuid by using combination of fields but need to
> > change the type to "string" else it throws "Invalid UUID String"
> >  > required="true" multiValued="false"/>
> >
> > a) I will have ~80 millions records and wondering if performance might be
> > issue
> > b) So, during update I can still use combination of fields i.e. uuid ?
> >
> > On Thu, Apr 16, 2015 at 2:44 PM, Erick Erickson  >
> > wrote:
> >
> >> This seems relevant:
> >>
> >>
> >>
> http://stackoverflow.com/questions/16914324/solr-4-missing-required-field-uuid
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Apr 16, 2015 at 11:38 AM, Kaushik 
> wrote:
> >> > You seem to have defined the field, but not populating it in the
> query.
> >> Use
> >> > a combination of fields to come up with a unique id that can be
> >> assigned to
> >> > uuid. Does that make sense?
> >> >
> >> > Kaushik
> >> >
> >> > On Thu, Apr 16, 2015 at 2:25 PM, Vishal Swaroop  >
> >> > wrote:
> >> >
> >> >> How to generate uuid/ id (maybe in data-config.xml...) for table
> which
> >> do
> >> >> not have any primary key.
> >> >>
> >> >> Scenario :
> >> >> Using DIH I need to import data from database but table does not have
> >> any
> >> >> primary key
> >> >> I do have uuid defined in schema.xml and is
> >> >>  >> required="true"
> >> >> multiValued="false"/>
> >> >> uuid
> >> >>
> >> >> data-config.xml
> >> >> 
> >> >> 
> >> >>  >> >>   batchSize="2000"
> >> >>   name="test"
> >> >>   type="JdbcDataSource"
> >> >>   driver="oracle.jdbc.OracleDriver"
> >> >>   url="jdbc:oracle:thin:@ldap:"
> >> >>   user="myUser"
> >> >>   password="pwd"/>
> >> >> 
> >> >>  >> >>   docRoot="true"
> >> >>   dataSource="test"
> >> >>   query="select name, age from test_user">
> >> >> 
> >> >> 
> >> >> 
> >> >>
> >> >> Error : Document is missing mandatory uniqueKey field: uuid
> >> >>
> >>
> >
> >
>


Mutli term synonyms

2015-04-20 Thread Kaushik
Hello,

Reading up on synonyms it looks like there is no real solution for multi
term synonyms. Is that right? I have a use case where I need to map one
multi term phrase to another. i.e. Tween 20 needs to be translated to
Polysorbate 40.

Any thoughts as to how this can be achieved?

Thanks,
Kaushik


Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-20 Thread Rishi Easwaran
Yeah I noticed that. Looks like optimize won't work since on some disks we are 
already pretty full.
Any thoughts on increasing/decreasing 10  or 
ConcurrentMergeScheduler to make solr do merges faster.   


 

 

 

-Original Message-
From: Gili Nachum 
To: solr-user 
Sent: Sun, Apr 19, 2015 12:34 pm
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


I assume you don't have much free space available in your disk. Notice
that
during optimization (merge into a single segment) your shard replica
space
usage may peak to 2x-3x of it's normal size until optimization
completes.
Is it a problem? Not if optimization occurs over shards serially and
your
index is broken to many small shards.
On Apr 18, 2015 1:54 AM, "Rishi
Easwaran"  wrote:

> Thanks Shawn for the quick
reply.
> Our indexes are running on SSD, so 3 should be ok.
> Any
recommendation on bumping it up?
>
> I guess will have to run optimize for
entire solr cloud and see if we can
> reclaim space.
>
> Thanks,
>
Rishi.
>
>
>
>
>
>
>
>
> -Original Message-
> From: Shawn
Heisey 
> To: solr-user 
>
Sent: Fri, Apr 17, 2015 6:22 pm
> Subject: Re: Solr Cloud reclaiming disk space
from deleted documents
>
>
> On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
> >
Running into an issue and wanted
> to see if anyone had some suggestions.
> >
We are seeing this with both solr 4.6
> and 4.10.3 code.
> > We are running an
extremely update heavy application, with
> millions of writes and deletes
happening to our indexes constantly.  An
> issue we
> are seeing is that solr
cloud reclaiming the disk space that can be used
> for new
> inserts, by
cleanup up deletes.
> >
> > We used to run optimize periodically with
> our
old multicore set up, not sure if that works for solr cloud.
> >
> > Num
>
Docs:28762340
> > Max Doc:48079586
> > Deleted Docs:19317246
> >
> >
Version
> 1429299216227
> > Gen 16525463
> > Size 109.92 GB
> >
> > In our
solrconfig.xml we
> use the following configs.
> >
> > 
> >

> >
> false
> >
>
1000
> >
>
2147483647
> >
>
1
> >
> >
>
10
> > 
class="org.apache.lucene.index.TieredMergePolicy"/>
> >
 class="org.apache.lucene.index.ConcurrentMergeScheduler">
>
>  name="maxThreadCount">3
> > 
name="maxMergeCount">15
> > 
> >
>
64
> >
> > 
>
> This
>
part of my response won't help the issue you wrote about, but it
> can
affect
> performance, so I'm going to mention it.  If your indexes are
>
stored on regular
> spinning disks, reduce mergeScheduler/maxThreadCount
> to
1.  If they are stored
> on SSD, then a value of 3 is OK.  Spinning
> disks
cannot do seeks (read/write
> head moves) fast enough to handle
> multiple
merging threads properly.  All the
> seek activity required will
> really slow
down merging, which is a very bad thing
> when your indexing
> load is high. 
SSD disks do not have to seek, so multiple
> threads are OK
> there.
>
> An
optimize is the only way to reclaim all of the disk
> space held by
> deleted
documents.  Over time, as segments are merged
> automatically,
> deleted doc
space will be automatically recovered, but it won't
> be
> perfect, especially
as segments are merged multiple times into very
> large
> segments.
>
> If
you send an optimize command to a core/collection in SolrCloud,
> the
> entire
collection will be optimized ... the cloud will do one
> shard
> replica
(core) at a time until the entire collection has been
> optimized.
> There is
no way (currently) to ask it to only optimize a
> single core, or to do
>
multiple cores simultaneously, even if they are on
> different
>
servers.
>
> Thanks,
> Shawn
>
>
>
>

 


RE: search by person name

2015-04-20 Thread Pedro Figueiredo
Any help please?

PF

-Original Message-
From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com] 
Sent: 20 de abril de 2015 14:19
To: solr-user@lucene.apache.org
Subject: RE: search by person name

yes

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150
 

Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU"
 


-Original Message-
From: Rafal Kuc [mailto:ra...@alud.com.pl]
Sent: 20 de abril de 2015 14:10
To: solr-user@lucene.apache.org
Subject: Re: search by person name

Hello, 

How does you query look like? Do you use phrase query, like q=name:"ana jose" ?

---
Regards,
Rafał Kuć




> Wiadomość napisana przez Pedro Figueiredo 
>  w dniu 20 kwi 2015, o godz. 15:06:
> 
> Hi all,
>  
> Can anyone advise the tokens and filters to use, for the most common way to 
> search by people’s names.
> The basics requirements are:
>  
> For field name – “Ana Maria José”
> The following search’s should return the example:
> 1.   “Ana”
> 2.   “Maria”
> 3.   “Jose”
> 4.   “ana maria”
> 5.   “ana jose”
>  
> With the following configuration I’m not able to satisfy all the searches 
> (namely the last one….):
>   
>
> 
> 
>  
> Thanks in advanced,
>  
> Pedro Figueiredo
> Senior Engineer
> 
> pjlfigueir...@criticalsoftware.com
> 
> M. 934058150
>  
> 
> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal 
> T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com 
> 
> 
> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® 
> LEVEL 5 RATED COMPANY  CMMI® is registered in the 
> USPTO by CMU "




RE: JSON Facet & Analytics API in Solr 5.1

2015-04-20 Thread Davis, Daniel (NIH/NLM) [C]
Indeed - XML is not "human readable" if it contains colons, JSON is not "human 
readable" if it is too deep, and the objects/keys are not semantic.
I also vote for flatter.

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, April 17, 2015 11:16 PM
To: solr-user@lucene.apache.org
Subject: Re: JSON Facet & Analytics API in Solr 5.1

Flatter please.  The other nested stuff makes my head hurt.  Until recently I 
thought I was the only person on the planet who had a hard time mentally 
parsing anything but the simplest JSON, but then I learned that I'm not alone 
at all it's just that nobody is saying it. :)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & 
Elasticsearch Support * http://sematext.com/



On Fri, Apr 17, 2015 at 7:26 PM, Trey Grainger  wrote:

> Agreed, I also prefer the second way. I find it more readible, less 
> verbose while communicating the same information, less confusing to 
> mentally parse ("is 'terms' the name of my facet, or the type of my 
> facet?..."), and less prone to syntactlcally valid, but logically 
> invalid inputs.  Let's break those topics down.
>
> *1) Less verbose while communicating the same information:* The 
> flatter structure is particularly useful when you have nested facets 
> to reduce unnecessary verbosity / extra levels. Let's contrast the two 
> approaches with just 2 levels of subfacets:
>
> ** Current Format **
> top_genres:{
> terms:{
> field: genre,
> limit: 5,
> facet:{
> top_authors:{
> terms:{
> field: author,
> limit: 4,
> facet: {
> top_books:{
> terms:{
> field: title,
> limit: 5
>}
>}
> }
> }
> }
> }
> }
> }
>
> ** Flat Format **
> top_genres:{
> type: terms,
> field: genre,
> limit: 5,
> facet:{
> top_authors:{
> type: terms
> field: author,
> limit: 4,
> facet: {
> top_books:{
> type: terms
> field: title,
> limit: 5
>}
> }
> }
> }
> }
>
> The flat format is clearly shorter and more succinct, while 
> communicating the same information. What value do the extra levels add?
>
>
> *2) Less confusing to mentally parse*
> I also find the flatter structure less confusing, as I'm consistently 
> having to take a mental pause with the current format to verify 
> whether "terms" is the name of my facet or the type of my facet and 
> have to count the curly braces to figure this out.  Not that I would 
> name my facets like this, but to give an extreme example of why that 
> extra mental calculation is necessary due to the name of an attribute 
> in the structure being able to represent both a facet name and facet type:
>
> terms: {
> terms: {
> field: genre,
> limit: 5,
> facet: {
> terms: {
> terms:{
> field: author
> limit: 4
> }
> }
> }
> }
> }
>
> In this example, the first "terms" is a facet name, the second "terms" 
> is a facet type, the third is a facet name, etc. Even if you don't 
> name your facets like this, it still requires parsing someone else's 
> query mentally to ensure that's not what was done.
>
> 3) *Less prone to syntactically valid, but logically invalid inputs* 
> Also, given this first format (where the type is indicated by one of 
> several possible attributes: terms, range, etc.), what happens if I 
> pass in multiple of the valid JSON attributes... the flatter structure 
> prevents this from being possible (which is a good thing!):
>
> top_authors : {
> terms : {
> field : author,
> limit : 5
> },
> range : {
> field : price,
> start : 0,
> end : 100,
> gap : 20
> }
> }
>
> I don't think the response format can currently handle this without 
> adding in extra levels to make it look like the input side, so this is 
> an exception case even thought it seems syntactically valid.
>
> So in conclusion, I'd give a strong vote to the flatter structure. Can 
> someone enumerate the benefits of the current format over the flatter 
> structure (I'm probably dense and just failing to see them currently)?
>
> Thanks,
>
> -Trey
>
>
> On Fri, Apr 17, 2015 at 2:28 PM, Jean-Sebastien Vachon < 
> jean-sebastien.vac...@wantedanalytics.com> wrote:
>
> > I prefer the second way. I find it more readable and shorter.
> >
> > Thanks for making Solr even better ;)
> >
> > 
> > From: Yonik Seeley 
> > Se

Re: Multilevel nested level support using Solr

2015-04-20 Thread Doug Turnbull
You might want to look at SirenDB from Sindice. It's a lucene codec that
allows native modeling of arbitrary hierarchies.

http://siren.sindice.com

On Friday, April 17, 2015, Steven White  wrote:

> Hi folks,
>
> In my DB, my records are nested in a folder base hierarchy:
>
> 
> 
> record_1
> record_2
> 
> record_3
> record_4
> 
> record_5
> 
> 
> 
> record_6
> record_7
> record_8
>
> You got the idea.
>
> Is there anything in Solr that will let me preserve this structer and thus
> when I'm searching to tell it in which level to narrow down the search?  I
> have four search levels needs:
>
> 1) Be able to search inside only level: ...* (and
> everything under Level_2 from this path).
>
> 2) Be able to search inside a level regardless it's path: .* (no
> matter where  is, i want to search on all records under Level_2
> and everything under it's path.
>
> 3) Same as #1 but limit the search to within that level (nothing below its
> level are searched).
>
> 4) Same as #3 but limit the search to within that level (nothing below its
> level are searched).
>
> I found this:
>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
> but it looks like it supports one level only and requires the whole two
> levels be updated even if 1 of the doc in the nest is updated.
>
> Thanks
>
> Steve
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Taming Search  from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Deploying SolrCloud 5 on Windows

2015-04-20 Thread Rahiem Burgess
Hello all,

I am new to Solr and I looking for advice and tips on deploying SolrCloud in a 
Windows production environment.

Rahiem S. Burgess
Sr. Software Engineer
Enterprise Integration Services
Johns Hopkins University
5801 Smith Avenue
Davis Building
Baltimore, MD  21209

Email: rah...@jhu.edu


Re: Differentiating user search term in Solr

2015-04-20 Thread Shawn Heisey
On 4/20/2015 7:41 AM, Steven White wrote:
> In my application, a user types "Apache Solr Notes".  I take that text and
> send it over to Solr like so:
>
>
> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
>
> And I get a hit on "Apache Solr Release Notes".  This is all good.
>
> Now if the same user types "Apache: Solr Notes" (notice the ":" after
> "Apache") I will get a SyntaxError.  The fix is to escape ":" before I send
> it to Solr.  What I want to figure out is how can I tell Solr / Lucene to
> ignore ":" and escape it for me?  In this example, I used ":" but my need
> is for all other operators and reserved Solr / Lucene characters.

If we assume that what you did for the first query is what you will do
for the second query, then this is what you would have sent:

q=title:(Apache: Solr Notes)

How is the parser supposed to know that only the second colon should be
escaped, and not the first one?  If you escape them both (or treat the
entire query string as query text), then the fact that you are searching
the "title" field is lost.  The text "title" becomes an actual part of
the query, and may not match, depending on what you have done with other
parameters, such as the default operator.

If you use the dismax parser (*NOT* the edismax parser, which parses
field:value queries and boolean operator syntax just like the lucene
parser), you may be able to achieve what you're after.

https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
https://wiki.apache.org/solr/DisMaxQParserPlugin

With dismax, you would use the qf and possibly the pf parameter to tell
it which fields to search and send this as the query:

q=Apache: Solr Notes

Thanks,
Shawn



Re: Multilevel nested level support using Solr

2015-04-20 Thread Steven White
Re sending to see if anyone can help.  Thanks

Steve

On Fri, Apr 17, 2015 at 12:14 PM, Steven White  wrote:

> Hi folks,
>
> In my DB, my records are nested in a folder base hierarchy:
>
> 
> 
> record_1
> record_2
> 
> record_3
> record_4
> 
> record_5
> 
> 
> 
> record_6
> record_7
> record_8
>
> You got the idea.
>
> Is there anything in Solr that will let me preserve this structer and thus
> when I'm searching to tell it in which level to narrow down the search?  I
> have four search levels needs:
>
> 1) Be able to search inside only level: ...* (and
> everything under Level_2 from this path).
>
> 2) Be able to search inside a level regardless it's path: .* (no
> matter where  is, i want to search on all records under Level_2
> and everything under it's path.
>
> 3) Same as #1 but limit the search to within that level (nothing below its
> level are searched).
>
> 4) Same as #3 but limit the search to within that level (nothing below its
> level are searched).
>
> I found this:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
> but it looks like it supports one level only and requires the whole two
> levels be updated even if 1 of the doc in the nest is updated.
>
> Thanks
>
> Steve
>


Re: Differentiating user search term in Solr

2015-04-20 Thread Steven White
Hi Hoss,

Thanks for that lengthy feedback, it is much appreciated.

Let me reset and bear in mind that I'm new to Solr.

I'm using Solr 5.0 (will switch over to 5.1 later this week) and my need is
as follows.

In my application, a user types "Apache Solr Notes".  I take that text and
send it over to Solr like so:


http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND

And I get a hit on "Apache Solr Release Notes".  This is all good.

Now if the same user types "Apache: Solr Notes" (notice the ":" after
"Apache") I will get a SyntaxError.  The fix is to escape ":" before I send
it to Solr.  What I want to figure out is how can I tell Solr / Lucene to
ignore ":" and escape it for me?  In this example, I used ":" but my need
is for all other operators and reserved Solr / Lucene characters.

This need to be configurable via a URL parameter to Solr / Lucene because
there are times I will send text to Solr that has valid operators and other
times not.  If such a URL parameter exists, than my client application no
longer has to maintain a list of operators to escape and it doesn't have to
keep up with Solr as new operators are added.

What do you think?  I hope I got my message across better this time.

PS: Looking at
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser
seems to be promising, but it doesn't include an example so I wan't able to
figure it out and it looks to me the list of operators is not complete
(there is no "{" for example)

Thanks

Steve

On Fri, Apr 17, 2015 at 3:02 PM, Chris Hostetter 
wrote:

>
> : It looks to me that "f" with "qq" is doing phrase search, that's not
> what I
> : want.  The data in the field "title" is "Apache Solr Release Notes"
>
> if you don't wnat phrase queries then you don't want pharse queries and
> that's fine -- but it wasn't clear from any of your original emails
> because you never provided (that i saw) any concrete examples of the types
> of queries you expected, the types of matches you wanted, and the types of
> matches you did *NOT* want.  details matter
>
> https://wiki.apache.org/solr/UsingMailingLists
>
>
> Based on that one concrete example i've now seen of what you *do* want to
> match: it seems that maybe a general description of your objective is that
> each of the "words" in your user input should treated as a mandatory
> clause in a boolean query -- but the concept of a "word" is already
> something that violates your earlier statement about not wanting the query
> parser to treat any "reserved characters" as special -- in order to
> recognize that "Apache", "Solr" and "Notes" should each be treated as
> independent mandatory clauses in a boolean query, then some query parser
> needs to recognize that *whitespace* is a syntactically significant
> character in your query string: it's what seperates the "words" in your
> input.
>
> the reason the "field" parser produces phrase queries in the example URLs
> you mentioned is because that parser doesn't have *ANY* special reserved
> characters -- not even whitespace.  it passes the entire input string to
> the analyzer of the configured (f) field.  if you are using TextField with
> a Tokenizer that means it gets split on whitespace, resulting in multiple
> *sequential* tokens, which will result in a phrase query (on the other
> hand, using something like StrField will cause the entire input string,
> spaces an all, to be serached as one single Term)
>
> : I looked over the links you provided and tried out the examples, in each
> : case if the user-typed-text contains any reserved characters, it will
> fail
> : with a syntax error (the exception is when I used "f" and "qq" but like I
> : said, that gave me 0 hit).
>
> As i said: Details matter.  which examples did you try? what configs were
> you using? what data where you using? which version of solr are you using?
> what exactly was the syntax error? etc ?
>
> "f" and "qq" are not magic -- saying you used them just means you used
> *some* parser that supports an "f" param ... if you tried it with the
> "term" or "field" parser then i don't know why you would have gotten a
> SyntaxError, but based on your goal it sounds like those parsers aren't
> really useful to you. (see below)
>
> : If you can give me a concrete example, please do.  My need is to pass to
> : Solr the text "Apache: Solr Notes" (without quotes) and get a hit as if I
> : passed "Apache\: Solr Notes" ?
>
> To re-iterate, saying you want the same bhavior as if you passed "Apache\:
> Solr Notes"  is a vague statment -- as if you passed that string to *what*
> ?  to the standard parser? to the dismax parser? using what request
> options? (q.op? qf? df?) ... query strings don't exist in a vacume.  the
> details & context matters.
>
> (I'm sorry if it feels like i keep hitting you over the head about this,
> i'm just trying to help you realize the breadth and scope of the variab

RE: search by person name

2015-04-20 Thread Pedro Figueiredo
yes

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150
 

Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"
 


-Original Message-
From: Rafal Kuc [mailto:ra...@alud.com.pl] 
Sent: 20 de abril de 2015 14:10
To: solr-user@lucene.apache.org
Subject: Re: search by person name

Hello, 

How does you query look like? Do you use phrase query, like q=name:"ana jose" ?

---
Regards,
Rafał Kuć




> Wiadomość napisana przez Pedro Figueiredo 
>  w dniu 20 kwi 2015, o godz. 15:06:
> 
> Hi all,
>  
> Can anyone advise the tokens and filters to use, for the most common way to 
> search by people’s names.
> The basics requirements are:
>  
> For field name – “Ana Maria José”
> The following search’s should return the example:
> 1.   “Ana”
> 2.   “Maria”
> 3.   “Jose”
> 4.   “ana maria”
> 5.   “ana jose”
>  
> With the following configuration I’m not able to satisfy all the searches 
> (namely the last one….):
>   
>
> 
> 
>  
> Thanks in advanced,
>  
> Pedro Figueiredo
> Senior Engineer
> 
> pjlfigueir...@criticalsoftware.com 
> 
> M. 934058150
>  
> 
> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal 
> T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com 
> 
> 
> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® 
> LEVEL 5 RATED COMPANY  CMMI® is registered in the 
> USPTO by CMU "




Re: search by person name

2015-04-20 Thread Rafal Kuc
Hello, 

How does you query look like? Do you use phrase query, like q=name:"ana jose" ?

---
Regards,
Rafał Kuć




> Wiadomość napisana przez Pedro Figueiredo 
>  w dniu 20 kwi 2015, o godz. 15:06:
> 
> Hi all,
>  
> Can anyone advise the tokens and filters to use, for the most common way to 
> search by people’s names.
> The basics requirements are:
>  
> For field name – “Ana Maria José”
> The following search’s should return the example:
> 1.   “Ana”
> 2.   “Maria”
> 3.   “Jose”
> 4.   “ana maria”
> 5.   “ana jose”
>  
> With the following configuration I’m not able to satisfy all the searches 
> (namely the last one….):
>   
>
>  
> 
>  
> Thanks in advanced,
>  
> Pedro Figueiredo
> Senior Engineer
> 
> pjlfigueir...@criticalsoftware.com 
> M. 934058150
>  
> 
> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
> T. +351 229 446 927 | F. +351 229 446 929
> www.criticalsoftware.com 
> 
> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
> A CMMI® LEVEL 5 RATED COMPANY  CMMI® is registered 
> in the USPTO by CMU "



search by person name

2015-04-20 Thread Pedro Figueiredo
Hi all,

 

Can anyone advise the tokens and filters to use, for the most common way to
search by people’s names.

The basics requirements are:

 

For field name – “Ana Maria José”

The following search’s should return the example:

1.   “Ana”

2.   “Maria”

3.   “Jose”

4.   “ana maria”

5.   “ana jose”

 

With the following configuration I’m not able to satisfy all the searches
(namely the last one….):




 



 

Thanks in advanced,

 


Pedro Figueiredo
Senior Engineer

 
pjlfigueir...@criticalsoftware.com
M. 934058150


 




Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
  www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
  A CMMI® LEVEL 5 RATED COMPANY CMMI® is
registered in the USPTO by   CMU"


 

 

 



Re: Correspondance table ?

2015-04-20 Thread Bruno Mannina

Hi Jack,

ok, it's not for many millions of users, just max 100 by day.
it will be used on traditional "PC" and also on mobile clients.

Then, I need to do test to verify the possibility.

Thx

Le 20/04/2015 14:20, Jack Krupansky a écrit :

It depends on the specific nature of your clients. Is they in-house users,
like only dozens or hundreds, or is this a large web app with many millions
of users and with mobile clients as well as traditional "PC" clients?

If it feels too much to do in the client, then a middleware API service
layer could be the way to go. In any case, don't try to load too much work
onto the Solr server itself.

-- Jack Krupansky

On Mon, Apr 20, 2015 at 7:32 AM, Bruno Mannina  wrote:


Hi Alex,

well ok but if I have a big table ? more than 10 000 entries ?
is it safe to do that client side ?

note:
I have one little table
but I have also 2 big tables for 2 other fields


Le 20/04/2015 10:57, Alexandre Rafalovitch a écrit :


The best place to do so is in the client software, since you are not
using it for search in any way. So, wherever you get your Solr's
response JSON/XML/etc, map it there.

Regards,
 Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 20 April 2015 at 18:23, Bruno Mannina  wrote:


Dear Solr Users,

Solr 5.0.0

I have actually around 90 000 000 docs in my solr, and I have a field
with
one char which represents a category. i.e:
value = a, definition : nature and health
etc...
I have fews categories, around 15.

These definition categories can changed during years.

Can I use a file where I will have
a\tNature and Health
b\tComputer science
etc...

and instead of having the code letter in my json result solr, I will have
the definition ?
Only in the result.
The query will be done with the code letter.

I'm sure it's possible !

Additional question: is it possible to do that also with a big
correspondance file? around 5000 definitions?

Thanks for your help,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Can't find result of autophrase filter

2015-04-20 Thread Mike Thomsen
This is the content of my autophrases.txt file:

al qaeda in the arabian peninsula
seat belt

I've attached a screenshot showing the analysis view of the index. When I
query for al_qaeda_in_the_arabian_peninsula or
alqaedainthearabianpeninsula, nothing comes back even though at least the
latter appears to be a token that makes it all the way through the index
filter chain.

I'm just using this to find it:

search_text:alqaedainthearabianpeninsula

Any ideas about why this isn't returning anything?

This is the field type declaration:


  







  
  







  



Re: Correspondance table ?

2015-04-20 Thread Jack Krupansky
It depends on the specific nature of your clients. Is they in-house users,
like only dozens or hundreds, or is this a large web app with many millions
of users and with mobile clients as well as traditional "PC" clients?

If it feels too much to do in the client, then a middleware API service
layer could be the way to go. In any case, don't try to load too much work
onto the Solr server itself.

-- Jack Krupansky

On Mon, Apr 20, 2015 at 7:32 AM, Bruno Mannina  wrote:

> Hi Alex,
>
> well ok but if I have a big table ? more than 10 000 entries ?
> is it safe to do that client side ?
>
> note:
> I have one little table
> but I have also 2 big tables for 2 other fields
>
>
> Le 20/04/2015 10:57, Alexandre Rafalovitch a écrit :
>
>> The best place to do so is in the client software, since you are not
>> using it for search in any way. So, wherever you get your Solr's
>> response JSON/XML/etc, map it there.
>>
>> Regards,
>> Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 20 April 2015 at 18:23, Bruno Mannina  wrote:
>>
>>> Dear Solr Users,
>>>
>>> Solr 5.0.0
>>>
>>> I have actually around 90 000 000 docs in my solr, and I have a field
>>> with
>>> one char which represents a category. i.e:
>>> value = a, definition : nature and health
>>> etc...
>>> I have fews categories, around 15.
>>>
>>> These definition categories can changed during years.
>>>
>>> Can I use a file where I will have
>>> a\tNature and Health
>>> b\tComputer science
>>> etc...
>>>
>>> and instead of having the code letter in my json result solr, I will have
>>> the definition ?
>>> Only in the result.
>>> The query will be done with the code letter.
>>>
>>> I'm sure it's possible !
>>>
>>> Additional question: is it possible to do that also with a big
>>> correspondance file? around 5000 definitions?
>>>
>>> Thanks for your help,
>>> Bruno
>>>
>>> ---
>>> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
>>> parce que la protection avast! Antivirus est active.
>>> http://www.avast.com
>>>
>>>
>>
>
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> parce que la protection avast! Antivirus est active.
> http://www.avast.com
>
>


Re: variable length ngramfilter highlights

2015-04-20 Thread Bjørn Hjelle
Dan, you could try do add luceneMatchVersion= "4.3" to your fieldType, like
so:



That worked for me with Solr versions prior to Solr 5.

Bjørn

On Thu, Apr 9, 2015 at 2:19 PM, Dan Sullivan  wrote:

> Hi,
>
>
> I apologize if this question is redundant.  I've spent a few days on it and
> scoured the Internet; I know that this question has been asked and answered
> in various capacities for different versions of Solr; the reason I am
> inquiring to this mailing list is because what I am attempting to do seems
> to be supported in the Solr API documentation at the following URL:
>
>
>
>
> https://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/anal
> ysis/ngram/NGramTokenFilter.html
>
>
>
> Here is what I am trying to do; I have a single text field that contains a
> large amount of data (it's not huge, but it may contain more than 2048
> characters of data for example).  What I would like to do is have full
> search capabilities (for a single input term that is a word, i.e. 'a' or
> 'queue') via a variable length NGramFilter with a size of 1..10 (for
> example).   I've read various posts that partial highlighting on variable
> length NGramFilters is 'broken' or that fast vector highlighting cannot be
> used.  Basically, it seems that I can search using NGramFilters, however
> the
> highlights that are being returned are inaccurate.
>
>
>
> I think my question is fundamental  in nature; should I be able to get
> accurate partial highlights of a variable length NGramfilter with any
> version of Solr (using any highlighter, standard fast vector or otherwise)?
> The documentation I linked above suggests it is possible.
>
>
>
> I appreciate you taking the time to help me.
>
>
>
> I have tried numerous configurations to no avail, so it might be moot to
> post my configuration, however here it is.
>
>
>
> schema.xml - https://gist.github.com/dsulli99/c1d8f3536ade65e8eb35
>
> solrconfig.xml https://gist.github.com/dsulli99/10e2af507cde4373adba
>
>
>
> Thank you,
>
>
>
> Dan
>
>
>
>
>
>
>
>


Solr 5: hit highlight with NGram/EdgeNgram-fields

2015-04-20 Thread Bjørn Hjelle
with Solr 4.10.3 I was advised to set luceneMatchVersion to "4.3" to make
hit highlight work with NGram/EdgeNgram- fields, like this:

 

In Solr 5 and 5.1 this seems to not work any more.
The complete word is  highlighted, not just the part that matches the
search term.

In Solr admin analysis page it again does not show the proper end-offset
positions. What is shows is this:

LENGTF
textt   te  tes test
raw_bytes   [74][74 65] [74 65 73]  [74 65 73 74]
start   0   0   0   0
end 4   4   4   4
positionLength  1   1   1   1
typewordwordwordword
position1   1   1   1

In Solr 4.10.3 with LuceneMatchVersion set to "4.3" end offset would be: 1,
2, 3, 4 and hit higlight would work.

Any advise on making hit highlight with (Edge)NGram -fields would be highly
appreciated!

Thanks,
Bjørn


Re: Correspondance table ?

2015-04-20 Thread Bruno Mannina

Hi Alex,

well ok but if I have a big table ? more than 10 000 entries ?
is it safe to do that client side ?

note:
I have one little table
but I have also 2 big tables for 2 other fields

Le 20/04/2015 10:57, Alexandre Rafalovitch a écrit :

The best place to do so is in the client software, since you are not
using it for search in any way. So, wherever you get your Solr's
response JSON/XML/etc, map it there.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 20 April 2015 at 18:23, Bruno Mannina  wrote:

Dear Solr Users,

Solr 5.0.0

I have actually around 90 000 000 docs in my solr, and I have a field with
one char which represents a category. i.e:
value = a, definition : nature and health
etc...
I have fews categories, around 15.

These definition categories can changed during years.

Can I use a file where I will have
a\tNature and Health
b\tComputer science
etc...

and instead of having the code letter in my json result solr, I will have
the definition ?
Only in the result.
The query will be done with the code letter.

I'm sure it's possible !

Additional question: is it possible to do that also with a big
correspondance file? around 5000 definitions?

Thanks for your help,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Unsubscribe from Mailing list

2015-04-20 Thread Ere Maijala
There's a wiki page about possible issues and solutions for 
unsubscribing, see 
https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists.


Regards,
Ere

20.4.2015, 12.23, Isha Garg kirjoitti:

Hi ,

Can anyone tell me how to unsubscribe from Solr  mailing lists. I tried sending 
email on 'solr-user-unsubscr...@lucene.apache.org', 
'general-unsubscr...@lucene.apache.org'. But it is not working for me.

Thanks & Regards,
Isha Garg
RAGE Frameworks/CreditPointe Services Pvt. LTD
India Off: +91 (20) 4141 3000 Ext:3043
www.rageframeworks.com
 
www.creditpointe.com





*

NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. All emails are scanned for any virus and monitored as per the 
Company information security policies and practices.


*
---
  This email has been scanned for email related threats and delivered safely by 
Mimecast.
  For more information please visit http://www.mimecast.com
---




--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Unsubscribe from Mailing list

2015-04-20 Thread Isha Garg
Hi ,

Can anyone tell me how to unsubscribe from Solr  mailing lists. I tried sending 
email on 'solr-user-unsubscr...@lucene.apache.org', 
'general-unsubscr...@lucene.apache.org'. But it is not working for me.

Thanks & Regards,
Isha Garg
RAGE Frameworks/CreditPointe Services Pvt. LTD
India Off: +91 (20) 4141 3000 Ext:3043
www.rageframeworks.com
 
www.creditpointe.com





*

NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. All emails are scanned for any virus and monitored as per the 
Company information security policies and practices.


*
---
 This email has been scanned for email related threats and delivered safely by 
Mimecast.
 For more information please visit http://www.mimecast.com
---


Re: Correspondance table ?

2015-04-20 Thread Alexandre Rafalovitch
The best place to do so is in the client software, since you are not
using it for search in any way. So, wherever you get your Solr's
response JSON/XML/etc, map it there.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 20 April 2015 at 18:23, Bruno Mannina  wrote:
> Dear Solr Users,
>
> Solr 5.0.0
>
> I have actually around 90 000 000 docs in my solr, and I have a field with
> one char which represents a category. i.e:
> value = a, definition : nature and health
> etc...
> I have fews categories, around 15.
>
> These definition categories can changed during years.
>
> Can I use a file where I will have
> a\tNature and Health
> b\tComputer science
> etc...
>
> and instead of having the code letter in my json result solr, I will have
> the definition ?
> Only in the result.
> The query will be done with the code letter.
>
> I'm sure it's possible !
>
> Additional question: is it possible to do that also with a big
> correspondance file? around 5000 definitions?
>
> Thanks for your help,
> Bruno
>
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> parce que la protection avast! Antivirus est active.
> http://www.avast.com
>


Correspondance table ?

2015-04-20 Thread Bruno Mannina

Dear Solr Users,

Solr 5.0.0

I have actually around 90 000 000 docs in my solr, and I have a field
with one char which represents a category. i.e:
value = a, definition : nature and health
etc...
I have fews categories, around 15.

These definition categories can changed during years.

Can I use a file where I will have
a\tNature and Health
b\tComputer science
etc...

and instead of having the code letter in my json result solr, I will
have the definition ?
Only in the result.
The query will be done with the code letter.

I'm sure it's possible !

Additional question: is it possible to do that also with a big
correspondance file? around 5000 definitions?

Thanks for your help,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Find out which MultiValued field got a hit (and a custom highlighter)

2015-04-20 Thread Rodolfo Zitellini
Dear List,
I have been studying Solr to build up an index of musical incipit encoded
as strings into bibliographical record to retrofit this kind of search into
an existing database.
Basically we store the incipit data (filtered through a custom TokenFilter)
as a multi valued field (one for each different incipit) inside a doc
(which is my bibliographical record).
Searching is very good and precise, but I have a problem: I cannot figure
out how to know which one of the values in my multivalued field generated
the hit! I only get a reference to a whole doc (and when I have 40 incipits
it is a bit of a problem). I thought I could just manually cycle all the
incipits in all the matching docs and match by hand, but this is not easy
since the values are mangled by the TokenFilter.
I saw some references that a solution to this problem is to use a
Highlighter and then extract the matching value. In principle this works (I
used FastVectorHighligter), but I have an additional problem: when
searching very broad queries (like all the incipits which start with a C) I
get obviously a big list of results (65k), but the highlighter would match
only a very small subset of them (2 or 3), whilist the query would return
all the correct (paginated) results. Attaching a debugger and tracing the
highlighter code I found that the FieldQuery just rewrites the fist 1024
queries, and hence for all the results > 1024 it is very easy that my
tokens will not get highlighted (and I cannot retrieve my value in the
multivalue field).
Can anyone help me out here? is there something very obvious I am missing?
Is there an easy mechanism to just get the field that matched a query in a
multiValued one?
Thanks!
Rodolfo