RE: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-13 Thread Dyer, Jim
I think if you have _root_ in schema.xml you should look elsewhere.  My memory 
is merely adding this one line to schema.xml took care of our problem.

From: Flowerday, Matthew J 
Sent: Tuesday, January 12, 2021 3:23 AM
To: solr-user@lucene.apache.org
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi Jim

Thanks for getting back to me.

I checked the schema.xml that we are using and it has the line you mentioned:



And this is the only reference (apart from within a comment) for _root_ In the 
schema.xml. Does your schema.xml have further references to _root_ that I could 
need? I also checked out solrconfig.xml file for any references to _root_ and 
there are none.

Many Thanks

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| 
matthew.flower...@unisys.com<mailto:matthew.flower...@unisys.com>
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17 8LX

[unisys_logo]<http://www.unisys.com/>

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
[Grey_LI]<http://www.linkedin.com/company/unisys>  [Grey_TW] 
<http://twitter.com/unisyscorp>  [Grey_YT] 
<http://www.youtube.com/theunisyschannel> [Grey_FB] 
<http://www.facebook.com/unisyscorp> [Grey_Vimeo] <https://vimeo.com/unisys> 
[Grey_UB] <http://blogs.unisys.com/>

From: Dyer, Jim 
mailto:james.d...@ingramcontent.com>>
Sent: 11 January 2021 22:58
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

EXTERNAL EMAIL - Be cautious of all links and attachments.
When we upgraded from 7.x to 8.x, I ran into an issue similar to yours:  when 
updating an existing document in the index, the document would be duplicated 
instead of replaced as expected.  The solution was to add a "_root_" field to 
schema.xml like this:



It appeared that when a feature was added for nested documents, this field 
somehow became mandatory in order for updates to work properly, at least in 
some cases.

From: Flowerday, Matthew J 
mailto:matthew.flower...@gb.unisys.com>>
Sent: Saturday, January 9, 2021 4:44 AM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi There

As a test I stopped Solr and ran the IndexUpgrader tool on the database to see 
if this might fix the issue. It completed OK but unfortunately the issue still 
occurs - a new version of the record on solr is created rather than updating 
the original record.

It looks to me as if the record created under 7.7.1 is somehow not being 
'marked as deleted' in the way that records created under 8.7.0 are. Is there a 
way for these records to be marked as deleted when they are updated.

Many Thanks

Matthew


Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| 
matthew.flower...@unisys.com<mailto:matthew.flower...@unisys.com>
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17 8LX

[unisys_logo]<http://www.unisys.com/>

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
[Grey_LI]<http://www.linkedin.com/company/unisys>  [Grey_TW] 
<http://twitter.com/unisyscorp>  [Grey_YT] 
<http://www.youtube.com/theunisyschannel> [Grey_FB] 
<http://www.facebook.com/unisyscorp> [Grey_Vimeo] <https://vimeo.com/unisys> 
[Grey_UB] <http://blogs.unisys.com/>

From: Flowerday, Matthew J 
mailto:matthew.flower...@gb.unisys.com>>
Sent: 07 January 2021 12:25
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi There

I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped the 
database and re-indexed (as this would take too long to run on site).

On my local windows machine I have a single solr server 7.7.1 installation

I upgraded in the following manner


  *   Installed windows solr 8.7.0 on my machine in a different folder
  *   Copied the core related folder (holding conf, data, lib, core.properties) 
from 7.7.1 to the new 8.7.0 folder
  *   Brought up the solr
  *   Checked that queries work through the Solr Admin Tool and our application

This all worked fine until I tried to update a record which had been created 
under 7.7.1. Instead of marking the old record as deleted it effectively 
created a new copy of the record with the change in and left the old image as 
still visible. When I updated the record again it then correctly updated the 
new

RE: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-11 Thread Dyer, Jim
When we upgraded from 7.x to 8.x, I ran into an issue similar to yours:  when 
updating an existing document in the index, the document would be duplicated 
instead of replaced as expected.  The solution was to add a "_root_" field to 
schema.xml like this:



It appeared that when a feature was added for nested documents, this field 
somehow became mandatory in order for updates to work properly, at least in 
some cases.

From: Flowerday, Matthew J 
Sent: Saturday, January 9, 2021 4:44 AM
To: solr-user@lucene.apache.org
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi There

As a test I stopped Solr and ran the IndexUpgrader tool on the database to see 
if this might fix the issue. It completed OK but unfortunately the issue still 
occurs - a new version of the record on solr is created rather than updating 
the original record.

It looks to me as if the record created under 7.7.1 is somehow not being 
'marked as deleted' in the way that records created under 8.7.0 are. Is there a 
way for these records to be marked as deleted when they are updated.

Many Thanks

Matthew


Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| 
matthew.flower...@unisys.com
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17 8LX

[unisys_logo]

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
[Grey_LI]  [Grey_TW] 
  [Grey_YT] 
 [Grey_FB] 
 [Grey_Vimeo]  
[Grey_UB] 

From: Flowerday, Matthew J 
mailto:matthew.flower...@gb.unisys.com>>
Sent: 07 January 2021 12:25
To: solr-user@lucene.apache.org
Subject: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi There

I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped the 
database and re-indexed (as this would take too long to run on site).

On my local windows machine I have a single solr server 7.7.1 installation

I upgraded in the following manner


  *   Installed windows solr 8.7.0 on my machine in a different folder
  *   Copied the core related folder (holding conf, data, lib, core.properties) 
from 7.7.1 to the new 8.7.0 folder
  *   Brought up the solr
  *   Checked that queries work through the Solr Admin Tool and our application

This all worked fine until I tried to update a record which had been created 
under 7.7.1. Instead of marking the old record as deleted it effectively 
created a new copy of the record with the change in and left the old image as 
still visible. When I updated the record again it then correctly updated the 
new 8.7.0 version without leaving the old image behind. If I created a new 
record and then updated it the solr record would be updated correctly. The 
issue only seemed to affect the old 7.7.1 created records.

An example of the duplication as follows (the first record is 7.7.1 created 
version and the second record is the 8.7.0 version after carrying out an 
update):

{
  "responseHeader":{
"status":0,
"QTime":4,
"params":{
  "q":"id:9901020319M01-N26",
  "_":"1610016003669"}},
  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
  {
"id":"9901020319M01-N26",
"groupId":"9901020319M01",
"urn":"N26",
"specification":"nominal",
"owningGroupId":"9901020319M01",
"description":"N26, Yates, Mike, Alan, Richard, MALE",
"group_t":"9901020319M01",
"nominalUrn_t":"N26",
"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",
"dateTimeCreated_dt":"2020-12-30T12:00:53Z",
"title_t":"Captain",
"surname_t":"Yates",
"qualifier_t":"Voyager",
"forename1_t":"Mike",
"forename2_t":"Alan",
"forename3_t":"Richard",
"sex_t":"MALE",
"orderedType_t":"Nominal",
"_version_":1687507566832123904},
  {
"id":"9901020319M01-N26",
"groupId":"9901020319M01",
"urn":"N26",
"specification":"nominal",
"owningGroupId":"9901020319M01",
"description":"N26, Yates, Mike, Alan, Richard, MALE",
"group_t":"9901020319M01",
"nominalUrn_t":"N26",
"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",
"dateTimeCreated_dt":"2020-12-30T12:00:53Z",
"title_t":"Captain",
"surname_t":"Yates",
"qualifier_t":"Voyager enterprise defiant yorktown xx yy",
"forename1_t":"Mike",
"forename2_t":"Alan",
"forename3_t":"Richard",
"sex_t":"MALE",
"orderedType_t":"Nominal",
"_version_":1688224966566215680}]
  }}

I 

Re: Error when trying to create a core in solr

2020-06-09 Thread Jim Anderson
Hi Erick,

I probably should have included information about the config directory. As
part of the setup, I had copied the config directory as follows:

$ cp -r /usr/share/solr-8.5.1/server/solr/configsets/_default/* .

Note that the copy was from solr-8.5.1 because I could not find a
'_default' directory in solr-7.3.1.  Coping from 8.5.1 may well be my
problem.
I will check and see if I can find a 7.3.1 example directory to copy from.
I will report back.

Regards,
Jim

On Tue, Jun 9, 2020 at 10:22 AM Erick Erickson 
wrote:

> You need the entire config directory for a start, not just the schema file.
>
> And there’s no need to copy things around, just path to the nutch-provided
> config directory and you can leave off the “conf” since the upload process
> automatically checks for it and does the right thing.
>
> Best,
> Erick
>
> > On Jun 9, 2020, at 9:50 AM, Jim Anderson 
> wrote:
> >
> > Hi,
> >
> > I am running Solr-7.3.1. I have just untarred the Solr-7.3.1 area and
> > created a 'nutch' directory for the core. I have downloaded
> > nutch-master.zip from
> > https://github.com/apache/nutch, unzipped that file and copied
> schema.xml
> > to .../server/solr/configsets/nutch/conf/schema.xml
> >
> > In the schema file, I modified the lastModified file value to true, with
> no
> > other changes.
> >
> > I am running the following command:
> >
> > .../bin/solr create -c nutch -d .../server/solr/configsets/nutch/conf/
> >
> > and getting the error message:
> >
> > ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch]
> > Caused by: Illegal pattern component: pp
> >
> > I have done a search for an error message containing: "Illegal pattern
> > component: pp" but I did not find anything useful.
> >
> > Can anyone help explain what this error message means and/or what needs
> to
> > be done to fix this problem?
> >
> > Jim A.
>
>


Error when trying to create a core in solr

2020-06-09 Thread Jim Anderson
Hi,

I am running Solr-7.3.1. I have just untarred the Solr-7.3.1 area and
created a 'nutch' directory for the core. I have downloaded
nutch-master.zip from
https://github.com/apache/nutch, unzipped that file and copied schema.xml
to .../server/solr/configsets/nutch/conf/schema.xml

In the schema file, I modified the lastModified file value to true, with no
other changes.

I am running the following command:

.../bin/solr create -c nutch -d .../server/solr/configsets/nutch/conf/

and getting the error message:

ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch]
Caused by: Illegal pattern component: pp

I have done a search for an error message containing: "Illegal pattern
component: pp" but I did not find anything useful.

Can anyone help explain what this error message means and/or what needs to
be done to fix this problem?

Jim A.


Re: Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
I cleared the Firefox cache and restarted and things are working ok now.

Jim

On Sun, Jun 7, 2020 at 3:44 PM Jim Anderson 
wrote:

> @Jan
>
> Thanks for the suggestion. I tried opera instead of firefox and it worked.
> I will try cleaner the cache on firefox, restart it and see if it works
> there.
>
> Jim
>
> On Sun, Jun 7, 2020 at 3:28 PM Jim Anderson 
> wrote:
>
>> An update.
>>
>> I started over by removing my Solr 7.3.1 installation and untarring again.
>>
>> Then went to the bin root directory and entered:
>>
>> bin/solr -start
>>
>> Next, I brought up the solr admin window and it still gives the same
>> error message and hangs up. As far as I can tell I am running solr straight
>> out of the box.
>>
>> Jim
>>
>> On Sun, Jun 7, 2020 at 3:07 PM Jim Anderson 
>> wrote:
>>
>>> >>> Did you install Solr with the installer script
>>>
>>> I was not aware that there is an install script. I will look for it, but
>>> if you can point me to it, that will help
>>>
>>> >>> or just
>>> >>> start it up after extracting the archive?
>>>
>>> I extracted the files from a tar ball and did a bit of setting up. For
>>> example, I created a core and modified my schema.xml file a bit.
>>>
>>> >> Does the solr/server/logs
>>> >> directory you mentioned contain files with timestamps that are
>>> current?
>>>
>>> The log files were current.
>>>
>>> >>> If you go to the "Logging" tab when the admin UI shows that error
>>>
>>> I cannot go to the "Logging" tab. When the admin UI comes up, it shows
>>> the error message and hangs with the cursor spinning.
>>>
>>> Thanks for the input. Again, if you can provide the install script, that
>>> will likely help. I'm going to go back and start with installing Solr again.
>>>
>>> Jim
>>>
>>>
>>>
>>> On Sun, Jun 7, 2020 at 1:09 PM Shawn Heisey  wrote:
>>>
>>>> On 6/7/2020 10:16 AM, Jim Anderson wrote:
>>>> > The admin pages comes up with:
>>>> >
>>>> > SolrCore Initialization Failures
>>>>
>>>> 
>>>>
>>>> > I look in my .../solr/server/logs directory and cannot find and
>>>> meaningful
>>>> > errors or warnings.
>>>> >
>>>> > Should I be looking elsewhere?
>>>>
>>>> That depends.  Did you install Solr with the installer script, or just
>>>> start it up after extracting the archive?  Does the solr/server/logs
>>>> directory you mentioned contain files with timestamps that are current?
>>>> If not, then the logs are likely going somewhere else.
>>>>
>>>> If you go to the "Logging" tab when the admin UI shows that error, you
>>>> will be able to see any log messages at WARN or higher severity.  Often
>>>> such log entries will need to be expanded by clicking on the little "i"
>>>> icon.  It will close again quickly, so you need to read fast.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>


Re: Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
@Jan

Thanks for the suggestion. I tried opera instead of firefox and it worked.
I will try cleaner the cache on firefox, restart it and see if it works
there.

Jim

On Sun, Jun 7, 2020 at 3:28 PM Jim Anderson 
wrote:

> An update.
>
> I started over by removing my Solr 7.3.1 installation and untarring again.
>
> Then went to the bin root directory and entered:
>
> bin/solr -start
>
> Next, I brought up the solr admin window and it still gives the same error
> message and hangs up. As far as I can tell I am running solr straight out
> of the box.
>
> Jim
>
> On Sun, Jun 7, 2020 at 3:07 PM Jim Anderson 
> wrote:
>
>> >>> Did you install Solr with the installer script
>>
>> I was not aware that there is an install script. I will look for it, but
>> if you can point me to it, that will help
>>
>> >>> or just
>> >>> start it up after extracting the archive?
>>
>> I extracted the files from a tar ball and did a bit of setting up. For
>> example, I created a core and modified my schema.xml file a bit.
>>
>> >> Does the solr/server/logs
>> >> directory you mentioned contain files with timestamps that are
>> current?
>>
>> The log files were current.
>>
>> >>> If you go to the "Logging" tab when the admin UI shows that error
>>
>> I cannot go to the "Logging" tab. When the admin UI comes up, it shows
>> the error message and hangs with the cursor spinning.
>>
>> Thanks for the input. Again, if you can provide the install script, that
>> will likely help. I'm going to go back and start with installing Solr again.
>>
>> Jim
>>
>>
>>
>> On Sun, Jun 7, 2020 at 1:09 PM Shawn Heisey  wrote:
>>
>>> On 6/7/2020 10:16 AM, Jim Anderson wrote:
>>> > The admin pages comes up with:
>>> >
>>> > SolrCore Initialization Failures
>>>
>>> 
>>>
>>> > I look in my .../solr/server/logs directory and cannot find and
>>> meaningful
>>> > errors or warnings.
>>> >
>>> > Should I be looking elsewhere?
>>>
>>> That depends.  Did you install Solr with the installer script, or just
>>> start it up after extracting the archive?  Does the solr/server/logs
>>> directory you mentioned contain files with timestamps that are current?
>>> If not, then the logs are likely going somewhere else.
>>>
>>> If you go to the "Logging" tab when the admin UI shows that error, you
>>> will be able to see any log messages at WARN or higher severity.  Often
>>> such log entries will need to be expanded by clicking on the little "i"
>>> icon.  It will close again quickly, so you need to read fast.
>>>
>>> Thanks,
>>> Shawn
>>>
>>


Re: Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
An update.

I started over by removing my Solr 7.3.1 installation and untarring again.

Then went to the bin root directory and entered:

bin/solr -start

Next, I brought up the solr admin window and it still gives the same error
message and hangs up. As far as I can tell I am running solr straight out
of the box.

Jim

On Sun, Jun 7, 2020 at 3:07 PM Jim Anderson 
wrote:

> >>> Did you install Solr with the installer script
>
> I was not aware that there is an install script. I will look for it, but
> if you can point me to it, that will help
>
> >>> or just
> >>> start it up after extracting the archive?
>
> I extracted the files from a tar ball and did a bit of setting up. For
> example, I created a core and modified my schema.xml file a bit.
>
> >> Does the solr/server/logs
> >> directory you mentioned contain files with timestamps that are current?
>
> The log files were current.
>
> >>> If you go to the "Logging" tab when the admin UI shows that error
>
> I cannot go to the "Logging" tab. When the admin UI comes up, it shows the
> error message and hangs with the cursor spinning.
>
> Thanks for the input. Again, if you can provide the install script, that
> will likely help. I'm going to go back and start with installing Solr again.
>
> Jim
>
>
>
> On Sun, Jun 7, 2020 at 1:09 PM Shawn Heisey  wrote:
>
>> On 6/7/2020 10:16 AM, Jim Anderson wrote:
>> > The admin pages comes up with:
>> >
>> > SolrCore Initialization Failures
>>
>> 
>>
>> > I look in my .../solr/server/logs directory and cannot find and
>> meaningful
>> > errors or warnings.
>> >
>> > Should I be looking elsewhere?
>>
>> That depends.  Did you install Solr with the installer script, or just
>> start it up after extracting the archive?  Does the solr/server/logs
>> directory you mentioned contain files with timestamps that are current?
>> If not, then the logs are likely going somewhere else.
>>
>> If you go to the "Logging" tab when the admin UI shows that error, you
>> will be able to see any log messages at WARN or higher severity.  Often
>> such log entries will need to be expanded by clicking on the little "i"
>> icon.  It will close again quickly, so you need to read fast.
>>
>> Thanks,
>> Shawn
>>
>


Re: Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
 >>> Did you install Solr with the installer script

I was not aware that there is an install script. I will look for it, but if
you can point me to it, that will help

>>> or just
>>> start it up after extracting the archive?

I extracted the files from a tar ball and did a bit of setting up. For
example, I created a core and modified my schema.xml file a bit.

>> Does the solr/server/logs
>> directory you mentioned contain files with timestamps that are current?

The log files were current.

>>> If you go to the "Logging" tab when the admin UI shows that error

I cannot go to the "Logging" tab. When the admin UI comes up, it shows the
error message and hangs with the cursor spinning.

Thanks for the input. Again, if you can provide the install script, that
will likely help. I'm going to go back and start with installing Solr again.

Jim



On Sun, Jun 7, 2020 at 1:09 PM Shawn Heisey  wrote:

> On 6/7/2020 10:16 AM, Jim Anderson wrote:
> > The admin pages comes up with:
> >
> > SolrCore Initialization Failures
>
> 
>
> > I look in my .../solr/server/logs directory and cannot find and
> meaningful
> > errors or warnings.
> >
> > Should I be looking elsewhere?
>
> That depends.  Did you install Solr with the installer script, or just
> start it up after extracting the archive?  Does the solr/server/logs
> directory you mentioned contain files with timestamps that are current?
> If not, then the logs are likely going somewhere else.
>
> If you go to the "Logging" tab when the admin UI shows that error, you
> will be able to see any log messages at WARN or higher severity.  Often
> such log entries will need to be expanded by clicking on the little "i"
> icon.  It will close again quickly, so you need to read fast.
>
> Thanks,
> Shawn
>


Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
Hi,

I'm a newbie with Solr, and going through tutorials and trying to get Solr
working with Nutch.

Today, I started up Solr and then brought up Solr Admin at:

http://localhost:8983/solr/

The admin pages comes up with:

SolrCore Initialization Failures

   - *{{core}}:* {{error}}

Please check your logs for more information


I look in my .../solr/server/logs directory and cannot find and meaningful
errors or warnings.


Should I be looking elsewhere?

Jim A.


Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Erick,

Thanks for the clarification on the JVM heap space. I will invoke java as
you advise.

The program that I am writing is a java example that I took off the
internet. The intent of the example is to read an existing core stored in
solr. I created the core using instructions that I found in a tutorial. I
think the example from the tutorial worked ok, because I can see the core
in solr that was created using nutch. So I think my status is that I have a
good core, and I was trying to read and print out the documents in that
core.

My current plan is to try to find and intall Nutch 1.17 and then clear and
reinstall solr 8.5.1 and start over again with a clean slate.

Regards,
Jim


On Sat, Jun 6, 2020 at 10:25 AM Erick Erickson 
wrote:

> I’m not talking about how much memory your machine has,
> the critical bit it’s how much heap space is allocated to the
> JVM to run your app.
>
> You can increase it by specifying -Xmx2G say when you
> invoke Java.
>
> The version difference is suspicious indeed. I’m a little
> confused here. Exactly _what_ program is crashing? An
> independent app you wrote or nutch? If the former, you could
> try compiling your Java app against the Solr jars provided
> with the Solr version that ships with Nutch 1.16 (Solr 7.3.1?).
>
> Best,
> Erick
>
> > On Jun 6, 2020, at 9:30 AM, Jim Anderson 
> wrote:
> >
> > Erick,
> >
> > Thanks for the suggestion. I will keep it in the back of my mind for now.
> > My PC has 8 G-bytes of memory and has roughly 4 G-bytes in use.
> >
> > If the forefront, I'm looking at the recommended solr/nutch combinations.
> > I'm using Solr 8.5.1 with nutch 1.16. The recommendation is to use nutch
> > 1.17 with Solr 8.5.1, but 1.17 has not been released for download.
> > Consequently, I used nutch 1.16. I'm not sure that will make a
> difference,
> > but I am suspicious.
> >
> > Jim
> >
> > On Sat, Jun 6, 2020 at 9:18 AM Erick Erickson 
> > wrote:
> >
> >> I’d look for an OutOfMemory problem before going too much farther.
> >> The simplest way to see if that’s in the right direction would be to
> >> run your SolrJ program with a massive memory size. Perhaps monitor
> >> your program with jconsole or similar to see if there’s any clues about
> >> memory usage.
> >>
> >> OOMs lead to unpredictable behavior, so it’s at least a possibility that
> >> this is the root cause. If so, there’s nothing SolrJ can do about it
> >> exactly
> >> because the state of a program is indeterminate afterwards, even if the
> >> OOM is caught somewhere. I suppose you could also try to catch that
> >> exception in the top-level of your program.
> >>
> >> I’m assuming a stand-alone program here, if you’re running some custom
> >> code in Solr itself, make sure the oom-killer script is running.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Jun 6, 2020, at 8:23 AM, Jim Anderson 
> >> wrote:
> >>>
> >>> Shawn,
> >>>
> >>> Thanks for the explanation. Very good response.
> >>>
> >>> The first paragraph helped clarify what a collection is. I have read
> >> quite
> >>> about about Solr. There is so much to absorb that it is slowly sinking
> >> in.
> >>> Your 2nd paragraph definitely answered my question, i.e. passing a core
> >>> name should be ok when a collection name is specified as a method
> >> argument.
> >>> This is what I did.
> >>>
> >>> Regarding the 3rd paragraph, it is good to know that Solrj is fairly
> >> robust
> >>> and should not be crashing. Nevertheless, that is what is happening.
> The
> >>> call to client.query() is wrapped in a try/catch sequence. Apparently
> no
> >>> exceptions were detected, or the program crashed before the exception
> >> could
> >>> be raised.
> >>>
> >>> My next step is to check where I can report this to the Solr folks and
> >> see
> >>> if they can figure out what it is crashing. BTW, I had not checked my
> >>> output file before this morning. The output file indicates that the
> >> program
> >>> ran to completion, so I am guessing that at least one other thread is
> >> being
> >>> created and that that  thread is crashing.
> >>>
> >>> Regards,
> >>> Jim
> >>>
> >>> On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey 
> >> wrote:
> >>>
> >>>> On 6/5/2020 4:24 PM

Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Erick,

Thanks for the suggestion. I will keep it in the back of my mind for now.
My PC has 8 G-bytes of memory and has roughly 4 G-bytes in use.

If the forefront, I'm looking at the recommended solr/nutch combinations.
I'm using Solr 8.5.1 with nutch 1.16. The recommendation is to use nutch
1.17 with Solr 8.5.1, but 1.17 has not been released for download.
Consequently, I used nutch 1.16. I'm not sure that will make a difference,
but I am suspicious.

Jim

On Sat, Jun 6, 2020 at 9:18 AM Erick Erickson 
wrote:

> I’d look for an OutOfMemory problem before going too much farther.
> The simplest way to see if that’s in the right direction would be to
> run your SolrJ program with a massive memory size. Perhaps monitor
> your program with jconsole or similar to see if there’s any clues about
> memory usage.
>
> OOMs lead to unpredictable behavior, so it’s at least a possibility that
> this is the root cause. If so, there’s nothing SolrJ can do about it
> exactly
> because the state of a program is indeterminate afterwards, even if the
> OOM is caught somewhere. I suppose you could also try to catch that
> exception in the top-level of your program.
>
> I’m assuming a stand-alone program here, if you’re running some custom
> code in Solr itself, make sure the oom-killer script is running.
>
> Best,
> Erick
>
> > On Jun 6, 2020, at 8:23 AM, Jim Anderson 
> wrote:
> >
> > Shawn,
> >
> > Thanks for the explanation. Very good response.
> >
> > The first paragraph helped clarify what a collection is. I have read
> quite
> > about about Solr. There is so much to absorb that it is slowly sinking
> in.
> > Your 2nd paragraph definitely answered my question, i.e. passing a core
> > name should be ok when a collection name is specified as a method
> argument.
> > This is what I did.
> >
> > Regarding the 3rd paragraph, it is good to know that Solrj is fairly
> robust
> > and should not be crashing. Nevertheless, that is what is happening. The
> > call to client.query() is wrapped in a try/catch sequence. Apparently no
> > exceptions were detected, or the program crashed before the exception
> could
> > be raised.
> >
> > My next step is to check where I can report this to the Solr folks and
> see
> > if they can figure out what it is crashing. BTW, I had not checked my
> > output file before this morning. The output file indicates that the
> program
> > ran to completion, so I am guessing that at least one other thread is
> being
> > created and that that  thread is crashing.
> >
> > Regards,
> > Jim
> >
> > On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey 
> wrote:
> >
> >> On 6/5/2020 4:24 PM, Jim Anderson wrote:
> >>> I am running my first solrj program and it is crashing when I call the
> >>> method
> >>>
> >>> client.query("coreName",queryParms)
> >>>
> >>> The API doc says the string should be a collection. I'm still not sure
> >>> about the difference between a collection and a core, so what I am
> doing
> >> is
> >>> likely illegal. Given that I have created a core, create a collection
> >> from
> >>> it so that I can truly pass a collection name to the query function?
> >>
> >> The concept of a collection comes from SolrCloud.  A collection is made
> >> up of one or more shards.  A shard is made up of one or more replicas.
> >> Each replica is a core.  If you're not running SolrCloud, then you do
> >> not have collections.
> >>
> >> Wherever SolrJ docs says "collection" as a parameter for a request, it
> >> is likely that you can think "core" instead and have it still be
> >> correct.  If you're running SolrCloud, you'll want to be very careful to
> >> know the difference.
> >>
> >> It seems very odd for a SolrJ query to cause the program to crash.  It
> >> would be pretty common for it to throw an exception, but that's not the
> >> same as a crash, unless exception handling is incorrect or missing.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Shawn,

Thanks for the explanation. Very good response.

The first paragraph helped clarify what a collection is. I have read quite
about about Solr. There is so much to absorb that it is slowly sinking in.
Your 2nd paragraph definitely answered my question, i.e. passing a core
name should be ok when a collection name is specified as a method argument.
This is what I did.

Regarding the 3rd paragraph, it is good to know that Solrj is fairly robust
and should not be crashing. Nevertheless, that is what is happening. The
call to client.query() is wrapped in a try/catch sequence. Apparently no
exceptions were detected, or the program crashed before the exception could
be raised.

My next step is to check where I can report this to the Solr folks and see
if they can figure out what it is crashing. BTW, I had not checked my
output file before this morning. The output file indicates that the program
ran to completion, so I am guessing that at least one other thread is being
created and that that  thread is crashing.

Regards,
Jim

On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey  wrote:

> On 6/5/2020 4:24 PM, Jim Anderson wrote:
> > I am running my first solrj program and it is crashing when I call the
> > method
> >
> > client.query("coreName",queryParms)
> >
> > The API doc says the string should be a collection. I'm still not sure
> > about the difference between a collection and a core, so what I am doing
> is
> > likely illegal. Given that I have created a core, create a collection
> from
> > it so that I can truly pass a collection name to the query function?
>
> The concept of a collection comes from SolrCloud.  A collection is made
> up of one or more shards.  A shard is made up of one or more replicas.
> Each replica is a core.  If you're not running SolrCloud, then you do
> not have collections.
>
> Wherever SolrJ docs says "collection" as a parameter for a request, it
> is likely that you can think "core" instead and have it still be
> correct.  If you're running SolrCloud, you'll want to be very careful to
> know the difference.
>
> It seems very odd for a SolrJ query to cause the program to crash.  It
> would be pretty common for it to throw an exception, but that's not the
> same as a crash, unless exception handling is incorrect or missing.
>
> Thanks,
> Shawn
>


SolrClient.query take a 'collection' argument

2020-06-05 Thread Jim Anderson
I am running my first solrj program and it is crashing when I call the
method

client.query("coreName",queryParms)

The API doc says the string should be a collection. I'm still not sure
about the difference between a collection and a core, so what I am doing is
likely illegal. Given that I have created a core, create a collection from
it so that I can truly pass a collection name to the query function?

Jim A.


Re: Building a web based search engine

2020-06-02 Thread Jim Anderson
Markus,

Thank for your replies. I will review them and experiment more and see if I
can get everything working.

Jim

On Tue, Jun 2, 2020 at 2:36 PM Markus Jelsma 
wrote:

> Hello, see inline.
>
> Markus
>
> -Original message-
> > From:Jim Anderson 
> > Sent: Tuesday 2nd June 2020 19:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Building a web based search engine
> >
> > Hi Markus,
> >
> > Thanks for your response. I appreciate you giving me the bullet list of
> > things to do. I can take that list and work from it and hopefully make
> > progress, but I don't think it will get me where I want to be - just a
> bit
> > closer.
> >
> > You say, "We have been building precisely that for over ten years now".
> Is
> > it in a document? I would like to read it.
>
> No, i haven't written a book about it and don't intend to.
>
> > Some basic things I would like to know that should be documented:
> >
> > 1) Using nutch as the crawler, how do I run a nutch thread that crawls my
> > named URLs.
>
> You don't, but run Nutch as a separate process from the command line. Or
> when you have to deal with 50+ million records, you run it on Apache Hadoop.
>
> > 2) I will use nutch to visit websites and create documents in solr. How
> do
> > I verify that documents have been created in Solr via nutch?
>
> By searching for them using Solr, or retrieving them by URL, using Solr's
> simple HTTP API. You can use SolrJ, the Java client, too.
>
> > 3) Solr will store and index the documents. How do I verify the index?
>
> See 2.
>
> > 4) I assume I can run a tomcat server on my host and then provide a
> > localhost URI to my web browser. Tomcat will then forward the URI to my
> > application. My application will take a query and using a java API is
> will
> > pass the query to Solr. I would like to see an example of a java program
> > passing a query to Solr.
>
> See 3. Though i would recommend to use Solr's HTTP API, it is much easier
> to deal with.
>
> > 5) Solr will take the query, parse it and then locate appropriate
> documents
> > using the index. Is there a log in Solr showing what queries have been
> > parsed?
>
> Yes, see Solr's log directory.
>
> > 6) Solr will pass back the list of documents it has located. I have not
> > really looked at this issue yet, but it would be nice to have an example
> of
> > this.
>
> Search for a SolrJ tutorial, they are plentiful. Also check out Solr's own
> extensive manual, everything you need is there.
>
> > Jim
> >
> >
> >
> > On Tue, Jun 2, 2020 at 12:12 PM Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello,
> > >
> > > We have been building precisely that for over ten years now. The
> '10,000
> > > foot level overview' is basically:
> > >
> > > * forget about Lucene for now, Solr uses it under the hood;
> > > * get Solr, and start it with the schema.xml file that comes with
> Nutch;
> > > * get Nutch, give it a set of domains or hosts to crawl and some URLs
> to
> > > start the crawl with and point the indexer towards the previously
> > > configured Solr;
> > > * put a proxy in front of Solr (we use Nginx), or skip this step if it
> is
> > > just an internal demo (do not expose Solr to the outside world);
> > > * make some basic JS tool that handles input and search result
> responses.
> > >
> > > This was our first web search engine prototype and it was set up in a
> few
> > > days. The chapter "How To Build A Web Based Search Engine With Solr,
> Lucene
> > > and Nutch" just means: set up Solr, and point Nutch towards it, and
> tell it
> > > to start crawling and indexing.
> > >
> > > Then there comes and endless list of things to improve, autocomplete,
> > > spell checking, query and click log handling and analysis, proper text
> > > extraction, etc.
> > >
> > > Regards,
> > > Markus
> > >
> > > -Original message-
> > > > From:Jim Anderson 
> > > > Sent: Tuesday 2nd June 2020 16:36
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Building a web based search engine
> > > >
> > > > Hi,
> > > >
> > > > I have been looking at solr, lucene and nutch websites and
> tutuorials for
> > > > over a week now, experimenting and learning, but also frustrated be
> the
> > > > fact the I 

Re: Building a web based search engine

2020-06-02 Thread Jim Anderson
Hi Markus,

Thanks for your response. I appreciate you giving me the bullet list of
things to do. I can take that list and work from it and hopefully make
progress, but I don't think it will get me where I want to be - just a bit
closer.

You say, "We have been building precisely that for over ten years now". Is
it in a document? I would like to read it.

Some basic things I would like to know that should be documented:

1) Using nutch as the crawler, how do I run a nutch thread that crawls my
named URLs.
2) I will use nutch to visit websites and create documents in solr. How do
I verify that documents have been created in Solr via nutch?
3) Solr will store and index the documents. How do I verify the index?
4) I assume I can run a tomcat server on my host and then provide a
localhost URI to my web browser. Tomcat will then forward the URI to my
application. My application will take a query and using a java API is will
pass the query to Solr. I would like to see an example of a java program
passing a query to Solr.
5) Solr will take the query, parse it and then locate appropriate documents
using the index. Is there a log in Solr showing what queries have been
parsed?
6) Solr will pass back the list of documents it has located. I have not
really looked at this issue yet, but it would be nice to have an example of
this.

Jim



On Tue, Jun 2, 2020 at 12:12 PM Markus Jelsma 
wrote:

> Hello,
>
> We have been building precisely that for over ten years now. The '10,000
> foot level overview' is basically:
>
> * forget about Lucene for now, Solr uses it under the hood;
> * get Solr, and start it with the schema.xml file that comes with Nutch;
> * get Nutch, give it a set of domains or hosts to crawl and some URLs to
> start the crawl with and point the indexer towards the previously
> configured Solr;
> * put a proxy in front of Solr (we use Nginx), or skip this step if it is
> just an internal demo (do not expose Solr to the outside world);
> * make some basic JS tool that handles input and search result responses.
>
> This was our first web search engine prototype and it was set up in a few
> days. The chapter "How To Build A Web Based Search Engine With Solr, Lucene
> and Nutch" just means: set up Solr, and point Nutch towards it, and tell it
> to start crawling and indexing.
>
> Then there comes and endless list of things to improve, autocomplete,
> spell checking, query and click log handling and analysis, proper text
> extraction, etc.
>
> Regards,
> Markus
>
> -Original message-
> > From:Jim Anderson 
> > Sent: Tuesday 2nd June 2020 16:36
> > To: solr-user@lucene.apache.org
> > Subject: Building a web based search engine
> >
> > Hi,
> >
> > I have been looking at solr, lucene and nutch websites and tutuorials for
> > over a week now, experimenting and learning, but also frustrated be the
> > fact the I am totally missing the 'how to' do what I want. I see a lot of
> > examples of how to use each of the tools, but not how to put them all
> > together. I think an 'overview' at the 10,000 foot level is needed, Maybe
> > one is available and I have not yet found it. If someone can point me to
> > one, please do.
> >
> > If I am correct that an overview on "How To Build A Web Based Search
> Engine
> > With Solr, Lucene and Nutch" is not available, then I will be willing to
> > write an overview and make it available to the Solr community.  I will
> need
> > input, explanation and review of others.
> >
> > My 2 goals are:
> >
> > 1) Build a demo web based search engine [Note: I have a very specific
> > business need to able to demonstrate a web application on top of a search
> > engine. This demo is intended to show a 'proof of concept' of the web
> > application to a small audience.]
> >
> > 2) Document the process of building the demo and customizing it using the
> > java API so that others can more easily build their own web base search
> > engine.
> >
> > Jim Anderson
> >
>


Building a web based search engine

2020-06-02 Thread Jim Anderson
Hi,

I have been looking at solr, lucene and nutch websites and tutuorials for
over a week now, experimenting and learning, but also frustrated be the
fact the I am totally missing the 'how to' do what I want. I see a lot of
examples of how to use each of the tools, but not how to put them all
together. I think an 'overview' at the 10,000 foot level is needed, Maybe
one is available and I have not yet found it. If someone can point me to
one, please do.

If I am correct that an overview on "How To Build A Web Based Search Engine
With Solr, Lucene and Nutch" is not available, then I will be willing to
write an overview and make it available to the Solr community.  I will need
input, explanation and review of others.

My 2 goals are:

1) Build a demo web based search engine [Note: I have a very specific
business need to able to demonstrate a web application on top of a search
engine. This demo is intended to show a 'proof of concept' of the web
application to a small audience.]

2) Document the process of building the demo and customizing it using the
java API so that others can more easily build their own web base search
engine.

Jim Anderson


hj...@yahoo.com

2019-12-30 Thread Jim Shi
⌚https://world.craigslist.org/d/search/?q=Robert-Pattinson/Discuss.asp?



































Prospect / 29411
Dickson Anna31.12.2019 2:16:48


[ANNOUNCE] Apache Solr 8.0.0 released

2019-03-14 Thread jim ferenczi
14 March 2019, Apache Solr™ 8.0.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 8.0.0

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

The release is available for immediate download at:

http://www.apache.org/dyn/closer.lua/lucene/solr/8.0.0

Please read CHANGES.txt for a detailed list of changes:

https://lucene.apache.org/solr/8_0_0/changes/Changes.html

Solr 8.0.0 Release Highlights
* Solr now uses HTTP/2 for inter-node communication

Being a major release, Solr 8 removes many deprecated APIs, changes various
parameter defaults and behavior. Some changes may require a re-index of
your content. You are thus encouraged to thoroughly read the "Upgrade
Notes" at http://lucene.apache.org/solr/8_0_0/changes/Changes.html or in
the CHANGES.txt file accompanying the release.

Solr 8.0 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


[ANNOUNCE] Apache Solr 7.7.0 released

2019-02-11 Thread jim ferenczi
11 February 2019, Apache Solr™ 7.7.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 7.7.0

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document (e.g., Word, PDF) handling, and geospatial
search. Solr is highly scalable, providing fault tolerant distributed
search and indexing, and powers the search and navigation features of many
of the world's largest internet sites.

Solr 7.7.0 is available for immediate download at:
http://lucene.apache.org/solr/downloads.html

See http://lucene.apache.org/solr/7_7_0/changes/Changes.html for a full
list of details.

Solr 7.7.0 Release Highlights:

Bug Fixes:
  * URI Too Long with large streaming expressions in SolrJ.
  * A failure while reloading a SolrCore can result in the SolrCore not
being closed.
  * Spellcheck parameters not working in new UI.
  * New Admin UI Query does not URL-encode the query produced in the URL
box.
  * Rule-base Authorization plugin skips authorization if querying node
does not have collection replica.
  * Solr installer fails on SuSE linux.
  * Fix incorrect SOLR_SSL_KEYSTORE_TYPE variable in solr start script.

Improvements:
  * JSON 'terms' Faceting now supports a 'prelim_sort' option to use when
initially selecting the top ranking buckets, prior to the final 'sort'
option used after refinement.
  * Add a login page to Admin UI, with initial support for Basic Auth and
Kerberos.
  * New Node-level health check handler at /admin/info/healthcheck and
/node/health paths that checks if the node is live, connected to zookeeper
and not shutdown.
  * It is now possible to configure a host whitelist for distributed search.

You are encouraged to thoroughly read the "Upgrade Notes" at
http://lucene.apache.org/solr/7_7_0/changes/Changes.html or in the
CHANGES.txt file accompanying the release.

Solr 7.7 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/community.html#mailing-lists-irc)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


[ANNOUNCE] Apache Solr 7.5.0 released

2018-09-24 Thread jim ferenczi
24 September 2018, Apache Solr™ 7.5.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 7.5.0

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document (e.g., Word, PDF) handling, and geospatial
search. Solr is highly scalable, providing fault tolerant distributed
search and indexing, and powers the search and navigation features of many
of the world's largest internet sites.

Solr 7.5.0 is available for immediate download at:
http://lucene.apache.org/solr/downloads.html

See http://lucene.apache.org/solr/7_5_0/changes/Changes.html for a full
list of details.

Solr 7.5.0 Release Highlights:

  Nested/child documents may now be supplied as a field value instead of
stand-off. Future releases will leverage this semantic information.
  Enhance Autoscaling policy support to equally distribute replicas on the
basis of arbitrary properties.
  Nodes are now visible inside a view of the Admin UI "Cloud" tab, listing
nodes and key metrics.
  The status of zookeeper ensemble is now accessible under the Admin UI
Cloud tab.
  The new Korean morphological analyzer ("nori") has been added to default
distribution.

You are encouraged to thoroughly read the "Upgrade Notes" at
http://lucene.apache.org/solr/7_5_0/changes/Changes.html or in the
CHANGES.txt file accompanying the release.

Solr 7.5 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/community.html#mailing-lists-irc)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


Re: Facet Range with Stats

2018-05-11 Thread Jim Freeby
 Correction.  The solution below did not quite get what we need.
I need the stats reports for the range.
I'll keep digging on this one

On ‎Friday‎, ‎May‎ ‎11‎, ‎2018‎ ‎10‎:‎59‎:‎45‎ ‎AM‎ ‎PDT, Jim Freeby 
<jamesfre...@yahoo.com.INVALID> wrote:  
 
  I found a solution.
If I use tags for the facet range definition and the stats definition, I can 
include it in the facet pivot
stats=true
stats.field={!tag=piv1 percentiles='50'}price
facet=true
facet.range={!tag=r1}someDate
f.someDate.facet.range.start=2018-01-01T00:00:00Z
f.someDate.facet.range.end=2018-04-30T00:00:00Z
f.someDate.facet.range.gap=+1MONTH
facet.pivot={!stats=piv1 range=r1}category
Please let me know if there's a better way to achieve this.
Cheers,

Jim

    On ‎Friday‎, ‎May‎ ‎11‎, ‎2018‎ ‎09‎:‎23‎:‎39‎ ‎AM‎ ‎PDT, Jim Freeby 
<jamesfre...@yahoo.com.INVALID> wrote:  
 
 All,
I'd like to generate stats for the results of a facet range.
For example, calculate the mean sold price over a range of months.
Does anyone know how to do this?
This Jira issue seems to indicate its not yet possible.
[SOLR-6352] Let Stats Hang off of Range Facets - ASF JIRA

| 
| 
|  | 
[SOLR-6352] Let Stats Hang off of Range Facets - ASF JIRA


 |

 |

 |



Thanks,  

Jim    

Re: Facet Range with Stats

2018-05-11 Thread Jim Freeby
 I found a solution.
If I use tags for the facet range definition and the stats definition, I can 
include it in the facet pivot
stats=true
stats.field={!tag=piv1 percentiles='50'}price
facet=true
facet.range={!tag=r1}someDate
f.someDate.facet.range.start=2018-01-01T00:00:00Z
f.someDate.facet.range.end=2018-04-30T00:00:00Z
f.someDate.facet.range.gap=+1MONTH
facet.pivot={!stats=piv1 range=r1}category
Please let me know if there's a better way to achieve this.
Cheers,

Jim

On ‎Friday‎, ‎May‎ ‎11‎, ‎2018‎ ‎09‎:‎23‎:‎39‎ ‎AM‎ ‎PDT, Jim Freeby 
<jamesfre...@yahoo.com.INVALID> wrote:  
 
 All,
I'd like to generate stats for the results of a facet range.
For example, calculate the mean sold price over a range of months.
Does anyone know how to do this?
This Jira issue seems to indicate its not yet possible.
[SOLR-6352] Let Stats Hang off of Range Facets - ASF JIRA

| 
| 
|  | 
[SOLR-6352] Let Stats Hang off of Range Facets - ASF JIRA


 |

 |

 |



Thanks,  

Jim  

Facet Range with Stats

2018-05-11 Thread Jim Freeby
All,
I'd like to generate stats for the results of a facet range.
For example, calculate the mean sold price over a range of months.
Does anyone know how to do this?
This Jira issue seems to indicate its not yet possible.
[SOLR-6352] Let Stats Hang off of Range Facets - ASF JIRA

| 
| 
|  | 
[SOLR-6352] Let Stats Hang off of Range Facets - ASF JIRA


 |

 |

 |



Thanks,  

Jim


Re: Median Date

2018-05-02 Thread Jim Freeby
 All,
percentiles only work with numbers, not dates.
If I use the ms function, I can get the number of milliseconds between NOW and 
the import date.  Then we can use that result in calculating the median age of 
the documents using percentiles.
rows=0=true={!tag=piv1 percentiles='50' func}ms(NOW, 
importDate)=true={!stats=piv1 }status
I hope this helps someone else :)  Also, let me know if there's a better way to 
do this.
Cheers,

Jim


On ‎Tuesday‎, ‎May‎ ‎1‎, ‎2018‎ ‎03‎:‎27‎:‎10‎ ‎PM‎ ‎PDT, Jim Freeby 
<jamesfre...@yahoo.com.INVALID> wrote:  
 
 All,
We have a dateImported field in our schema.
I'd like to generate a statistic showing the median dateImported (actually we 
want median age of the documents, based on the dateImported value).
I have other stats that calculate the median value of numbers (like price).
This was achieved with something like:
rows=0=true={!tag=piv1 
percentiles='50'}price=true={!stats=piv1 }status
I have not found a way to calculate the median dateImported.  The mean works, 
but we  need median.
Any help would be appreciated?
Cheers,

Jim  

Median Date

2018-05-01 Thread Jim Freeby
All,
We have a dateImported field in our schema.
I'd like to generate a statistic showing the median dateImported (actually we 
want median age of the documents, based on the dateImported value).
I have other stats that calculate the median value of numbers (like price).
This was achieved with something like:
rows=0=true={!tag=piv1 
percentiles='50'}price=true={!stats=piv1 }status
I have not found a way to calculate the median dateImported.  The mean works, 
but we  need median.
Any help would be appreciated?
Cheers,

Jim


[ANNOUNCE] Apache Solr 7.2.1 released

2018-01-15 Thread jim ferenczi
15 January 2018, Apache Solr™ 7.2.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 7.2.1

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

This release includes 3 bug fixes since the 7.2.0 release:

* Overseer can never process some last messages.

* Rename core in solr standalone mode is not persisted.

* QueryComponent's rq parameter parsing no longer considers the defType
parameter.

* Fix NPE in SolrQueryParser when the query terms inside a filter clause
reduce to nothing.

Furthermore, this release includes Apache Lucene 7.2.1 which includes 1 bug
fix since the 7.2.0 release.

The release is available for immediate download at:

http://www.apache.org/dyn/closer.lua/lucene/solr/7.2.1

Please read CHANGES.txt for a detailed list of changes:

https://lucene.apache.org/solr/7_2_1/changes/Changes.html

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


[ANNOUNCE] Apache Solr 6.5.1 released

2017-04-27 Thread jim ferenczi
27 April 2017, Apache Solr™ 6.5.1 available


The Lucene PMC is pleased to announce the release of Apache Solr 6.5.1


Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.


This release includes 11 bug fixes since the 6.5.0 release. Some of the
major fixes are:


* bin\solr.cmd delete and healthcheck now works again; fixed continuation
chars ^


* Fix debug related NullPointerException in solr/contrib/ltr
OriginalScoreFeature class.


* The JSON output of /admin/metrics is fixed to write the container as a
map (SimpleOrderedMap) instead of an array (NamedList).


* On 'downnode', lots of wasteful mutations are done to ZK.


* Fix params persistence for solr/contrib/ltr (MinMax|Standard)Normalizer
classes.


* The fetch() streaming expression wouldn't work if a value included query
syntax chars (like :+-). Fixed, and enhanced the generated query to not
pollute the queryCache.


* Disable graph query production via schema configuration . This fixes broken queries for
ShingleFilter-containing query-time analyzers when request param sow=false.


* Fix indexed="false" on numeric PointFields


* SQL AVG function mis-interprets field type.


* SQL interface does not use client cache.


* edismax with sow=false fails to create dismax-per-term queries when any
field is boosted.


Furthermore, this release includes Apache Lucene 6.5.1 which includes 3 bug
fixes since the 6.5.0 release.


The release is available for immediate download at:


http://www.apache.org/dyn/closer.lua/lucene/solr/6.5.1

Please read CHANGES.txt for a detailed list of changes:


https://lucene.apache.org/solr/6_5_1/changes/Changes.html

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)


Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


[ANNOUNCE] Apache Solr 6.5.1 released

2017-04-27 Thread jim ferenczi
27 April 2017, Apache Solr™ 6.5.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 6.5.1

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

This release includes 11 bug fixes since the 6.5.0 release. Some of the
major fixes are:

* bin\solr.cmd delete and healthcheck now works again; fixed continuation
chars ^

* Fix debug related NullPointerException in solr/contrib/ltr
OriginalScoreFeature class.

* The JSON output of /admin/metrics is fixed to write the container as a
map (SimpleOrderedMap) instead of an array (NamedList).

* On 'downnode', lots of wasteful mutations are done to ZK.

* Fix params persistence for solr/contrib/ltr (MinMax|Standard)Normalizer
classes.

* The fetch() streaming expression wouldn't work if a value included query
syntax chars (like :+-). Fixed, and enhanced the generated query to not
pollute the queryCache.

* Disable graph query production via schema configuration . This fixes
broken queries for ShingleFilter-containing query-time analyzers when
request param sow=false.

* Fix indexed="false" on numeric PointFields

* SQL AVG function mis-interprets field type.

* SQL interface does not use client cache.

* edismax with sow=false fails to create dismax-per-term queries when any
field is boosted.

Furthermore, this release includes Apache Lucene 6.5.1 which includes 3 bug
fixes since the 6.5.0 release.

The release is available for immediate download at:

http://www.apache.org/dyn/closer.lua/lucene/solr/6.5.1
Please read CHANGES.txt for a detailed list of changes:

https://lucene.apache.org/solr/6_5_1/changes/Changes.html
Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


[ANNOUNCE] Apache Solr 6.5.1 released

2017-04-27 Thread jim ferenczi
27 April 2017, Apache Solr™ 6.5.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 6.5.1

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

This release includes 11 bug fixes since the 6.5.0 release. Some of the
major fixes are:

* bin\solr.cmd delete and healthcheck now works again; fixed continuation
chars ^

* Fix debug related NullPointerException in solr/contrib/ltr
OriginalScoreFeature class.

* The JSON output of /admin/metrics is fixed to write the container as a
map (SimpleOrderedMap) instead of an array (NamedList).

* On 'downnode', lots of wasteful mutations are done to ZK.

* Fix params persistence for solr/contrib/ltr (MinMax|Standard)Normalizer
classes.

* The fetch() streaming expression wouldn't work if a value included query
syntax chars (like :+-). Fixed, and enhanced the generated query to not
pollute the queryCache.

* Disable graph query production via schema configuration . This fixes broken queries for
ShingleFilter-containing query-time analyzers when request param sow=false.

* Fix indexed="false" on numeric PointFields

* SQL AVG function mis-interprets field type.

* SQL interface does not use client cache.

* edismax with sow=false fails to create dismax-per-term queries when any
field is boosted.

Furthermore, this release includes Apache Lucene 6.5.1 which includes 3 bug
fixes since the 6.5.0 release.

The release is available for immediate download at:

http://www.apache.org/dyn/closer.lua/lucene/solr/6.5.1
Please read CHANGES.txt for a detailed list of changes:

https://lucene.apache.org/solr/6_5_1/changes/Changes.html
Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


[ANNOUNCE] Apache Solr 6.5.0 released

2017-03-27 Thread jim ferenczi
27 March 2017, Apache Solr 6.5.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 6.5.0.

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

Solr 6.5.0 is available for immediate download at:

   -

   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Please read CHANGES.txt for a full list of new features and changes:

   -

   https://lucene.apache.org/solr/6_5_0/changes/Changes.html

Highlights of this Solr release include:

   - PointFields (fixed-width multi-dimensional numeric & binary types
   enabling fast range search) are now supported
   - In-place updates to numeric docValues fields (single valued,
   non-stored, non-indexed) supported using atomic update syntax
   - A new LatLonPointSpatialField that uses points or doc values for query
   - It is now possible to declare a field as "large" in order to bypass
   the document cache
   - New sow=false request param (split-on-whitespace) for edismax &
   standard query parsers enables query-time multi-term synonyms
   - XML QueryParser (defType=xmlparser) now supports span queries
   - hl.maxAnalyzedChars now have consistent default across highlighters
   - UnifiedSolrHighlighter and PostingsSolrHighlighter now support
   CustomSeparatorBreakIterator
   - Scoring formula is adjusted for the scoreNodes function
   - Calcite Planner now applies constant Reduction Rules to optimize plans
   - A new significantTerms Streaming Expression that is able to extract
   the significant terms in an index
   - StreamHandler is now able to use runtimeLib jars
   - Arithmetic operations are added to the SelectStream
   - Added modernized self-documenting /v2 API
   - The .system collection is now created on first request if it does not
   exist
   - Admin UI: Added shard deletion button
   - Metrics API now supports non-numeric metrics (version, disk type,
   component state, system properties...)
   - The disk free and aggregated disk free metrics are now reported
   - The DirectUpdateHandler2 now implements MetricsProducer and exposes
   stats via the metrics api and configured reporters.
   - BlockCache is faster due to less failures when caching a new block
   - MMapDirectoryFactory now supports "preload" option to ask mapped pages
   to be loaded into physical memory on init
   - Security: BasicAuthPlugin now supports standalone mode
   - Arbitrary java system properties can be passed to zkcli
   - SolrHttpClientBuilder can be configured via java system property
   - Javadocs and Changes.html are no longer included in the binary
   distribution, but are hosted online

Further details of changes are available in the change log available at:
http://lucene.apache.org/solr/6_5_0/changes/Changes.html

Please report any feedback to the mailing lists (http://lucene.apache.org/
solr/discussion.html)
Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.

   -


[ANNOUNCE] Apache Solr 6.4.0 released

2017-01-23 Thread jim ferenczi
23 January 2016 - Apache Solr™ 6.4.0 Available
The Lucene PMC is pleased to announce the release of Apache Solr 6.4.0.

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

Solr 6.4.0 is available for immediate download at:
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Highlights of this Solr release include:

Streaming:
  * Addition of a HavingStream to Streaming API and Streaming Expressions
  * Addition of a priority Streaming Expression
  * Streaming expressions now support collection aliases

Machine Learning:
  * Configurable Learning-To-Rank (LTR) support: upload feature
definitions, extract feature values, upload your own machine learnt models
and use them to rerank search results.

Faceting:
  * Added "param" query type to facet domain filter specification to obtain
filters via query parameters
  * Any facet command can be filtered using a new parameter filter.
Example: { type:terms, field:category, filter:"user:yonik" }

Scripts / Command line:
  * A new command-line tool to manage the snapshots functionality
  * bin/solr and bin/solr.cmd now use mkroot command

SolrCloud / SolrJ
  * LukeResponse now supports dynamic fields
  * Solrj client now supports hierarchical clusters and other topics marker
  * Collection backup/restore are extensible.

Security:
  * Support Secure Impersonation / Proxy User for Solr authentication
  * Key Store type can be specified in solr.in.sh file for SSL
  * New generic authentication plugins: 'HadoopAuthPlugin' and
'ConfigurableInternodeAuthHadoopPlugin' that delegate all functionality to
Hadoop authentication framework

Query / QueryParser / Highlighting:
  * A new highlighter: The Unified Highlighter. Try it via
hl.method=unified; many popular highlighting parameters / features are
supported. It's the highest performing highlighter, especially for large
documents. Highlighting phrase queries and exotic queries are supported
equally as well as the Original Highlighter (aka the default/standard one).
Please use this new highlighter and report issues since it will likely
become the default one day.
  * Leading wildcard in complexphrase query parser are now accepted and
optimized with the ReversedWildcardFilterFactory when it's provided

Metrics:
  * Use metrics-jvm library to instrument jvm internals such as GC, memory
usage and others.
  * A lot of metrics have been added to the collection: index merges, index
store I/Os, query, update, core admin, core load thread pools, shard
replication, tlog replay and replicas
  * A new /admin/metrics API to return all metrics collected by Solr via
API.

Misc changes:
  * The new config parameter 'maxRamMB'can now limit the memory consumed by
the FastLRUCache
  * A new document processor 'SkipExistingDocumentsProcessor' that skips
duplicate inserts and ignores updates to missing docs
  * FieldCache information fetched via the mbeans handler or seen via the
UI now displays the total size used.
  * A new config flag 'enable' allows to enable/disable any cache

Please note, this release cannot be built from source with Java 8 update
121, use an earlier version instead! This is caused by a bug introduced
into the Javadocs tool shipped with that update. The workaround was too
late for this Lucene release. Of course, you can use the binary artifacts.

See the Solr CHANGES.txt files included with the release for a full list of
details.

Thanks,
Jim Ferenczi


RE: Modifying fl in QParser

2016-09-01 Thread Beale, Jim (US-KOP)
All,

Thank you all for your responses.

Our custom QParser identifies several of many dynamic fields for construction 
of the actual Lucene query.  

For instance, given a custom Solr request consisting of 
"q={!xyzQP}scid:500247", a new query is to be constructed using information 
from a SQL query which selects certain dynamic fields, e.g. p_001, q_004, r_007 
along the lines of "+p_001:abc +q_004:abc +r_007:abc".  The requirement is to 
then configure fl to only include the stored fields psf_001, qsf_004 and 
rsf_007 in the response, but not the many other stored fields that are not 
relevant to the query.

What is the best way to accomplish this?  It would be convenient to be able to 
modify fl in the QParser.

Also, note that the index is not sharded. 

Thanks!

Jim

-Original Message-
From: Rohit Kanchan [mailto:rohitkan2...@gmail.com] 
Sent: Wednesday, August 31, 2016 12:42 AM
To: solr-user@lucene.apache.org
Subject: Re: Modifying fl in QParser

We are dealing with same thing, we have overridden QueryComponent (type of
SearchComponent) and added a field to retrieve there. That same field we are 
setting in SolrParams from query request. According to your business you need 
to figure out  how you can override QueryComponent. I hope this helps in 
solving your problem.

Thanks
Rohit Kanchan


On Tue, Aug 30, 2016 at 5:11 PM, Erik Hatcher <erik.hatc...@gmail.com>
wrote:

> Personally, I don’t think a QParser(Plugin) is the right place to modify
> other parameters, only to create a Query object.   A QParser could be
> invoked from an fq, not just a q, and will get invoked on multiple 
> nodes in SolrCloud, for example - this is why I think it’s not a good 
> idea to do anything but return a Query.
>
> It is possible (in fact I’m dealing with this very situation with a client
> as we speak) to set parameters this way, but I don’t recommend it.   Create
> a SearchComponent to do this job instead.
>
>     Erik
>
>
>
> > On Aug 9, 2016, at 10:23 AM, Beale, Jim (US-KOP) 
> > <jim.be...@hibu.com>
> wrote:
> >
> > Hi,
> >
> > Is it possible to modify the SolrParam, fl, to append selected 
> > dynamic
> fields, while rewriting a query in QParser.parse()?
> >
> > Thanks in advance!
> >
> >
> > Jim Beale
> > Senior Lead Developer
> > 2201 Renaissance Boulevard, King of Prussia, PA, 19406
> > Mobile: 610-220-3067
> >
> >
> >
> > The information contained in this email message, including any
> attachments, is intended solely for use by the individual or entity 
> named above and may be confidential. If the reader of this message is 
> not the intended recipient, you are hereby notified that you must not 
> read, use, disclose, distribute or copy any part of this 
> communication. If you have received this communication in error, 
> please immediately notify me by email and destroy the original message, 
> including any attachments. Thank you.
> **hibu IT Code:141459300**
>
>


RE: Modifying fl in QParser

2016-08-30 Thread Beale, Jim (US-KOP)
Anyone??

From: Beale, Jim (US-KOP) [mailto:jim.be...@hibu.com]
Sent: Tuesday, August 09, 2016 1:23 PM
To: solr-user@lucene.apache.org
Subject: Modifying fl in QParser

Hi,

Is it possible to modify the SolrParam, fl, to append selected dynamic fields, 
while rewriting a query in QParser.parse()?

Thanks in advance!


Jim Beale
Senior Lead Developer
2201 Renaissance Boulevard, King of Prussia, PA, 19406
Mobile: 610-220-3067

[cid:image001.png@01CD6E5F.BE5E6C20]

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you. **hibu IT Code:141459300**


Modifying fl in QParser

2016-08-09 Thread Beale, Jim (US-KOP)
Hi,

Is it possible to modify the SolrParam, fl, to append selected dynamic fields, 
while rewriting a query in QParser.parse()?

Thanks in advance!


Jim Beale
Senior Lead Developer
2201 Renaissance Boulevard, King of Prussia, PA, 19406
Mobile: 610-220-3067

[cid:image001.png@01CD6E5F.BE5E6C20]

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you. **hibu IT Code:141459300**


Re: java.lang.InterruptedException while reloading core

2016-07-14 Thread Jim Martin
I see this occasionally. I¹ve wondered if it¹s related to SOLR-4161.

This error also correlates with a sudden growth in memory usage within the
VM. Usually, this memory growth does not clear up on its own.

I¹m using Solr 4.10.4.



On 6/26/16, 9:11 PM, "jarpondy"  wrote:

>Hi
>
>We are seeing the below error(No files to download for index generation)
>followed by Interrupted exception.
>
>org.apache.solr.handler.SnapPuller; No files to download for index
>generation:
>
>
>org.apache.solr.common.SolrException; SnapPull failed
>:org.apache.solr.common.SolrException: Index fetch failed :
>   at
>org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:503)
>   at
>org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java
>:322)
>   at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223)
>   at
>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>   at
>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces
>s$301(ScheduledThreadPoolExecutor.java:178)
>   at
>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(S
>cheduledThreadPoolExecutor.java:293)
>   at
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1145)
>   at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:615)
>   at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.RuntimeException: Interrupted while waiting for core
>reload to finish
>   at org.apache.solr.handler.SnapPuller.reloadCore(SnapPuller.java:721)
>   at
>org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:473)
>   ... 9 more
>Caused by: java.lang.InterruptedException
>   at
>java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInter
>ruptibly(AbstractQueuedSynchronizer.java:996)
>   at
>java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterru
>ptibly(AbstractQueuedSynchronizer.java:1303)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
>   at org.apache.solr.handler.SnapPuller.reloadCore(SnapPuller.java:718)
>   ... 10 more
>
>
>We are using Solr4.6.1 Version.
>Any pointers why the core reload fails or what causes this to fail and how
>can we debug this issue.
>
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/java-lang-InterruptedException-while-re
>loading-core-tp4284423.html
>Sent from the Solr - User mailing list archive at Nabble.com.
>




CONFIDENTIALITY NOTICE: This message is intended only for the use and review of 
the individual or entity to which it is addressed and may contain information 
that is privileged and confidential. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message solely to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify 
sender immediately by telephone or return email. Thank you.


collection configuration stored in Zoo Keeper with solrCloud

2016-01-11 Thread Jim Shi
Hi, I have question regarding collection configurations stored Zoo Keeper with 
solrCloud.
All collection configurations are stored at Zoo Keeper. What happens if you 
want to restart all Zoo Keeper instances? Does the Zoo Keeper persists data on 
disk and can restore all configurations from disk?

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
Well it seems that doing q="network se*" is working but not in the way you
expect. Doing this q="network se*" would not trigger a prefix query and the
"*" character would be treated as any character. I suspect that your query
is in fact "network se" (assuming you're using a StandardTokenizer) and
that the word "se" is very popular in your documents. That would explain
the slow response time. Bottom line is that doing "network se*" will not
trigger prefix query at all (I may be wrong but this is the expected
behaviour for Solr up to 4.3).

2015-11-02 13:47 GMT+01:00 Modassar Ather <modather1...@gmail.com>:

> The problem is with the same query as phrase. q="network se*".
>
> The last . is fullstops for the sentence and the query is q=field:"network
> se*"
>
> Best,
> Modassar
>
> On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi <jim.feren...@gmail.com>
> wrote:
>
> > Oups I did not read the thread carrefully.
> > *The problem is with the same query as phrase. q="network se*".*
> > I was not aware that you could do that with Solr ;). I would say this is
> > expected because in such case if the number of expansions for "se*" is
> big
> > then you would have to check the positions for a significant words. I
> don't
> > know if there is a limitation in the number of expansions for a prefix
> > query contained into a phrase query but I would look at this parameter
> > first (limit the number of expansion per prefix search, let's say the N
> > most significant words based on the frequency of the words for instance).
> >
> > 2015-11-02 13:36 GMT+01:00 jim ferenczi <jim.feren...@gmail.com>:
> >
> > >
> > >
> > >
> > > *I am not able to get  the above point. So when I start Solr with 28g
> > RAM,
> > > for all the activities related to Solr it should not go beyond 28g. And
> > the
> > > remaining heap will be used for activities other than Solr. Please help
> > me
> > > understand.*
> > >
> > > Well those 28GB of heap are the memory "reserved" for your Solr
> > > application, though some parts of the index (not to say all) are
> > retrieved
> > > via MMap (if you use the default MMapDirectory) which do not use the
> heap
> > > at all. This is a very important part of Lucene/Solr, the heap should
> be
> > > sized in a way that let a significant amount of RAM available for the
> > > index. If not then you rely on the speed of your disk, if you have SSDs
> > > it's better but reads are still significantly slower with SSDs than
> with
> > > direct RAM access. Another thing to keep in mind is that mmap will
> always
> > > tries to put things in RAM, this is why I suspect that you swap
> activity
> > is
> > > killing your performance.
> > >
> > > 2015-11-02 11:55 GMT+01:00 Modassar Ather <modather1...@gmail.com>:
> > >
> > >> Thanks Jim for your response.
> > >>
> > >> The remaining size after you removed the heap usage should be reserved
> > for
> > >> the index (not only the other system activities).
> > >> I am not able to get  the above point. So when I start Solr with 28g
> > RAM,
> > >> for all the activities related to Solr it should not go beyond 28g.
> And
> > >> the
> > >> remaining heap will be used for activities other than Solr. Please
> help
> > me
> > >> understand.
> > >>
> > >> *Also the CPU utilization goes upto 400% in few of the nodes:*
> > >> You said that only machine is used so I assumed that 400% cpu is for a
> > >> single process (one solr node), right ?
> > >> Yes you are right that 400% is for single process.
> > >> The disks are SSDs.
> > >>
> > >> Regards,
> > >> Modassar
> > >>
> > >> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <jim.feren...@gmail.com>
> > >> wrote:
> > >>
> > >> > *if it correlates with the bad performance you're seeing. One
> > important
> > >> > thing to notice is that a significant part of your index needs to be
> > in
> > >> RAM
> > >> > (especially if you're using SSDs) in order to achieve good
> > performance.*
> > >> >
> > >> > Especially if you're not using SSDs, sorry ;)
> > >> >
> > >> > 2015-11-02 11:38 GMT+01:00 jim ferenczi <jim.feren...@gmail.com>:
> > >>

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which may
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in RAM,
the problem as I see it is that you don't have enough RAM for your index +
heap. Assuming your machine has 370GB of RAM there are only 34GB left for
your index, 1TO/34GB means that you can only have 1/30 of your entire index
in RAM. I would advise you to check the swap activity on the machine and
see if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance:



*As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.*
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).


*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played at a
time and no indexing is performed. Best thing to do is to dump stack trace
of the solr nodes during the query and to check what the threads are doing.

Jim



2015-11-02 10:38 GMT+01:00 Modassar Ather <modather1...@gmail.com>:

> Just to add one more point that one external Zookeeper instance is also
> running on this particular machine.
>
> Regards,
> Modassar
>
> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <modather1...@gmail.com>
> wrote:
>
> > Hi Toke,
> > Thanks for your response. My comments in-line.
> >
> > That is 12 machines, running a shard each?
> > No! This is a single big machine with 12 shards on it.
> >
> > What is the total amount of physical memory on each machine?
> > Around 370 gb on the single machine.
> >
> > Well, se* probably expands to a great deal of documents, but a huge bump
> > in memory utilization and 3 minutes+ sounds strange.
> >
> > - What are your normal query times?
> > Few simple queries are returned with in a couple of seconds. But the more
> > complex queries with proximity and wild cards have taken more than 3-4
> > minutes and some times some queries have timed out too where time out is
> > set to 5 minutes.
> > - How many hits do you get from 'network se*'?
> > More than a million records.
> > - How many results do you return (the rows-parameter)?
> > It is the default one 10. Grouping is enabled on a field.
> > - If you issue a query without wildcards, but with approximately the
> > same amount of hits as 'network se*', how long does it take?
> > A query resulting in around half a million record return within a couple
> > of seconds.
> >
> > That is strange, yes. Have you checked the logs to see if something
> > unexpected is going on while you test?
> > Have not seen anything particularly. Will try to check again.
> >
> > If you are using spinning drives and only have 32GB of RAM in total in
> > each machine, you are probably struggling just to keep things running.
> > As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
> > nodes total) is assigned 336 GB. The rest is still a good for other
> system
> > activities.
> >
> > Thanks,
> > Modassar
> >
> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
> > wrote:
> >
> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> >> > I have a setup of 12 shard cluster started with 28gb memory each on a
> >> > single server. There are no replica. The size of index is around 90gb
> on
> >> > each shard. The Solr version is 5.2.1.
> >>
> >> That is 12 machines, running a shard each?
> >>
> >> What is the total amount of physical memory on each machine?
> >>
> >> > When I query "network se*", the memory utilization goes upto 24-26 gb
> >> and
> >> > the query takes around 3+ minutes to execute. Also the CPU utilization
> >> goes
> >> > upto 400% in few of the nodes.
> >>
> >> Well, se* probably expands to a great deal of documents, but a huge bump
> >> in memory utilization and 3 minutes+ sounds strange.
> >>
> >> - What are your normal query times?
> >> - How many hits do you get from 'network se*'?
> >> - How many result

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance.*

Especially if you're not using SSDs, sorry ;)

2015-11-02 11:38 GMT+01:00 jim ferenczi <jim.feren...@gmail.com>:

> 12 shards with 28GB for the heap and 90GB for each index means that you
> need at least 336GB for the heap (assuming you're using all of it which may
> be easily the case considering the way the GC is handling memory) and ~=
> 1TO for the index. Let's say that you don't need your entire index in RAM,
> the problem as I see it is that you don't have enough RAM for your index +
> heap. Assuming your machine has 370GB of RAM there are only 34GB left for
> your index, 1TO/34GB means that you can only have 1/30 of your entire index
> in RAM. I would advise you to check the swap activity on the machine and
> see if it correlates with the bad performance you're seeing. One important
> thing to notice is that a significant part of your index needs to be in RAM
> (especially if you're using SSDs) in order to achieve good performance:
>
>
>
> *As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
> nodes total) is assigned 336 GB. The rest is still a good for other system
> activities.*
> The remaining size after you removed the heap usage should be reserved for
> the index (not only the other system activities).
>
>
> *Also the CPU utilization goes upto 400% in few of the nodes:*
> You said that only machine is used so I assumed that 400% cpu is for a
> single process (one solr node), right ?
> This seems impossible if you are sure that only one query is played at a
> time and no indexing is performed. Best thing to do is to dump stack trace
> of the solr nodes during the query and to check what the threads are doing.
>
> Jim
>
>
>
> 2015-11-02 10:38 GMT+01:00 Modassar Ather <modather1...@gmail.com>:
>
>> Just to add one more point that one external Zookeeper instance is also
>> running on this particular machine.
>>
>> Regards,
>> Modassar
>>
>> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <modather1...@gmail.com>
>> wrote:
>>
>> > Hi Toke,
>> > Thanks for your response. My comments in-line.
>> >
>> > That is 12 machines, running a shard each?
>> > No! This is a single big machine with 12 shards on it.
>> >
>> > What is the total amount of physical memory on each machine?
>> > Around 370 gb on the single machine.
>> >
>> > Well, se* probably expands to a great deal of documents, but a huge bump
>> > in memory utilization and 3 minutes+ sounds strange.
>> >
>> > - What are your normal query times?
>> > Few simple queries are returned with in a couple of seconds. But the
>> more
>> > complex queries with proximity and wild cards have taken more than 3-4
>> > minutes and some times some queries have timed out too where time out is
>> > set to 5 minutes.
>> > - How many hits do you get from 'network se*'?
>> > More than a million records.
>> > - How many results do you return (the rows-parameter)?
>> > It is the default one 10. Grouping is enabled on a field.
>> > - If you issue a query without wildcards, but with approximately the
>> > same amount of hits as 'network se*', how long does it take?
>> > A query resulting in around half a million record return within a couple
>> > of seconds.
>> >
>> > That is strange, yes. Have you checked the logs to see if something
>> > unexpected is going on while you test?
>> > Have not seen anything particularly. Will try to check again.
>> >
>> > If you are using spinning drives and only have 32GB of RAM in total in
>> > each machine, you are probably struggling just to keep things running.
>> > As mentioned above this is a big machine with 370+ gb of RAM and Solr
>> (12
>> > nodes total) is assigned 336 GB. The rest is still a good for other
>> system
>> > activities.
>> >
>> > Thanks,
>> > Modassar
>> >
>> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
>> > wrote:
>> >
>> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
>> >> > I have a setup of 12 shard cluster started with 28gb memory each on a
>> >> > single server. There are no replica. The size of index is around
>> 90gb on
>> >> > each shard. The Solr version is 5.2.1.
>> >>
>

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
*I am not able to get  the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.*

Well those 28GB of heap are the memory "reserved" for your Solr
application, though some parts of the index (not to say all) are retrieved
via MMap (if you use the default MMapDirectory) which do not use the heap
at all. This is a very important part of Lucene/Solr, the heap should be
sized in a way that let a significant amount of RAM available for the
index. If not then you rely on the speed of your disk, if you have SSDs
it's better but reads are still significantly slower with SSDs than with
direct RAM access. Another thing to keep in mind is that mmap will always
tries to put things in RAM, this is why I suspect that you swap activity is
killing your performance.

2015-11-02 11:55 GMT+01:00 Modassar Ather <modather1...@gmail.com>:

> Thanks Jim for your response.
>
> The remaining size after you removed the heap usage should be reserved for
> the index (not only the other system activities).
> I am not able to get  the above point. So when I start Solr with 28g RAM,
> for all the activities related to Solr it should not go beyond 28g. And the
> remaining heap will be used for activities other than Solr. Please help me
> understand.
>
> *Also the CPU utilization goes upto 400% in few of the nodes:*
> You said that only machine is used so I assumed that 400% cpu is for a
> single process (one solr node), right ?
> Yes you are right that 400% is for single process.
> The disks are SSDs.
>
> Regards,
> Modassar
>
> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <jim.feren...@gmail.com>
> wrote:
>
> > *if it correlates with the bad performance you're seeing. One important
> > thing to notice is that a significant part of your index needs to be in
> RAM
> > (especially if you're using SSDs) in order to achieve good performance.*
> >
> > Especially if you're not using SSDs, sorry ;)
> >
> > 2015-11-02 11:38 GMT+01:00 jim ferenczi <jim.feren...@gmail.com>:
> >
> > > 12 shards with 28GB for the heap and 90GB for each index means that you
> > > need at least 336GB for the heap (assuming you're using all of it which
> > may
> > > be easily the case considering the way the GC is handling memory) and
> ~=
> > > 1TO for the index. Let's say that you don't need your entire index in
> > RAM,
> > > the problem as I see it is that you don't have enough RAM for your
> index
> > +
> > > heap. Assuming your machine has 370GB of RAM there are only 34GB left
> for
> > > your index, 1TO/34GB means that you can only have 1/30 of your entire
> > index
> > > in RAM. I would advise you to check the swap activity on the machine
> and
> > > see if it correlates with the bad performance you're seeing. One
> > important
> > > thing to notice is that a significant part of your index needs to be in
> > RAM
> > > (especially if you're using SSDs) in order to achieve good performance:
> > >
> > >
> > >
> > > *As mentioned above this is a big machine with 370+ gb of RAM and Solr
> > (12
> > > nodes total) is assigned 336 GB. The rest is still a good for other
> > system
> > > activities.*
> > > The remaining size after you removed the heap usage should be reserved
> > for
> > > the index (not only the other system activities).
> > >
> > >
> > > *Also the CPU utilization goes upto 400% in few of the nodes:*
> > > You said that only machine is used so I assumed that 400% cpu is for a
> > > single process (one solr node), right ?
> > > This seems impossible if you are sure that only one query is played at
> a
> > > time and no indexing is performed. Best thing to do is to dump stack
> > trace
> > > of the solr nodes during the query and to check what the threads are
> > doing.
> > >
> > > Jim
> > >
> > >
> > >
> > > 2015-11-02 10:38 GMT+01:00 Modassar Ather <modather1...@gmail.com>:
> > >
> > >> Just to add one more point that one external Zookeeper instance is
> also
> > >> running on this particular machine.
> > >>
> > >> Regards,
> > >> Modassar
> > >>
> > >> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
> modather1...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Toke,
> > >> > Thanks for your response. My comments in-line.
>

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
Oups I did not read the thread carrefully.
*The problem is with the same query as phrase. q="network se*".*
I was not aware that you could do that with Solr ;). I would say this is
expected because in such case if the number of expansions for "se*" is big
then you would have to check the positions for a significant words. I don't
know if there is a limitation in the number of expansions for a prefix
query contained into a phrase query but I would look at this parameter
first (limit the number of expansion per prefix search, let's say the N
most significant words based on the frequency of the words for instance).

2015-11-02 13:36 GMT+01:00 jim ferenczi <jim.feren...@gmail.com>:

>
>
>
> *I am not able to get  the above point. So when I start Solr with 28g RAM,
> for all the activities related to Solr it should not go beyond 28g. And the
> remaining heap will be used for activities other than Solr. Please help me
> understand.*
>
> Well those 28GB of heap are the memory "reserved" for your Solr
> application, though some parts of the index (not to say all) are retrieved
> via MMap (if you use the default MMapDirectory) which do not use the heap
> at all. This is a very important part of Lucene/Solr, the heap should be
> sized in a way that let a significant amount of RAM available for the
> index. If not then you rely on the speed of your disk, if you have SSDs
> it's better but reads are still significantly slower with SSDs than with
> direct RAM access. Another thing to keep in mind is that mmap will always
> tries to put things in RAM, this is why I suspect that you swap activity is
> killing your performance.
>
> 2015-11-02 11:55 GMT+01:00 Modassar Ather <modather1...@gmail.com>:
>
>> Thanks Jim for your response.
>>
>> The remaining size after you removed the heap usage should be reserved for
>> the index (not only the other system activities).
>> I am not able to get  the above point. So when I start Solr with 28g RAM,
>> for all the activities related to Solr it should not go beyond 28g. And
>> the
>> remaining heap will be used for activities other than Solr. Please help me
>> understand.
>>
>> *Also the CPU utilization goes upto 400% in few of the nodes:*
>> You said that only machine is used so I assumed that 400% cpu is for a
>> single process (one solr node), right ?
>> Yes you are right that 400% is for single process.
>> The disks are SSDs.
>>
>> Regards,
>> Modassar
>>
>> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <jim.feren...@gmail.com>
>> wrote:
>>
>> > *if it correlates with the bad performance you're seeing. One important
>> > thing to notice is that a significant part of your index needs to be in
>> RAM
>> > (especially if you're using SSDs) in order to achieve good performance.*
>> >
>> > Especially if you're not using SSDs, sorry ;)
>> >
>> > 2015-11-02 11:38 GMT+01:00 jim ferenczi <jim.feren...@gmail.com>:
>> >
>> > > 12 shards with 28GB for the heap and 90GB for each index means that
>> you
>> > > need at least 336GB for the heap (assuming you're using all of it
>> which
>> > may
>> > > be easily the case considering the way the GC is handling memory) and
>> ~=
>> > > 1TO for the index. Let's say that you don't need your entire index in
>> > RAM,
>> > > the problem as I see it is that you don't have enough RAM for your
>> index
>> > +
>> > > heap. Assuming your machine has 370GB of RAM there are only 34GB left
>> for
>> > > your index, 1TO/34GB means that you can only have 1/30 of your entire
>> > index
>> > > in RAM. I would advise you to check the swap activity on the machine
>> and
>> > > see if it correlates with the bad performance you're seeing. One
>> > important
>> > > thing to notice is that a significant part of your index needs to be
>> in
>> > RAM
>> > > (especially if you're using SSDs) in order to achieve good
>> performance:
>> > >
>> > >
>> > >
>> > > *As mentioned above this is a big machine with 370+ gb of RAM and Solr
>> > (12
>> > > nodes total) is assigned 336 GB. The rest is still a good for other
>> > system
>> > > activities.*
>> > > The remaining size after you removed the heap usage should be reserved
>> > for
>> > > the index (not only the other system activities).
>> > >
>> > >
>> > > *Also the CPU utilization goes upto 400% in few of the nodes:*

What does replicationFactor really do?

2015-07-16 Thread Jim . Musil
Hi,

In 5.1, we are creating a collection using the Collections API with an initial 
replicationFactor of X. This value is then stored in the state.json file for 
that collection.

If I try to issue ADDREPLICA on this cluster, it throws an error saying that 
there are no live nodes for additional replicas.

If I connect a new solr node to zookeeper and issue an ADDREPLICA call, the 
replica is created and no errors are thrown, but replicationFactor remains at X 
in the state.json file.

Why? What does replicationFactor really mean? It seems like it's being honored 
in some cases and ignored in others.

Thanks for any help you can provide.

Cheers,
Jim




Re: fq versus q

2015-06-24 Thread jim ferenczi
 In part of queries we see strange behavior where q performs 5-10x better
 than fq. The question is why?
Are you sure that the query result cache is disabled ?

2015-06-24 13:28 GMT+02:00 Esther Goldbraich estherg...@il.ibm.com:

 Hi,

 We are comparing the performance of fq versus q for queries that are
 actually filters and should not be cached.
 In part of queries we see strange behavior where q performs 5-10x better
 than fq. The question is why?

 An example1:
 q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1
 to DATE2}
 sort=maildate_sort* desc
 rows=50
 start=0
 group=true
 group.query=some query (without dates)
 group.query=*:*
 group.sort=maildate_sort desc
 additional fqs

 Schema:
 field name=maildate stored=true indexed=true type=tdate/
 field name=maildate_sort stored=false indexed=false type=tdate
 docValues=true/

 Thank you,
 Esther
 -
 Esther Goldbraich
 Social Technologies  Analytics - IBM Haifa Research Lab
 Phone: +972-4-8281059


CREATE collection bug or feature?

2015-06-19 Thread Jim . Musil
I noticed that when I issue the CREATE collection command to the api, it does 
not automatically put a replica on every live node connected to zookeeper.

So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and 
create a collection like this:

/admin/collections?action=CREATEname=my_collectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.configName=my_config

It will only create a core on one of the three nodes. I can make it work if I 
change replicationFactor to 3. When standing up an entire stack using chef, 
this all gets a bit clunky. I don't see any option such as ALL that would 
just create a replica on all nodes regardless of size.

I'm guessing this is intentional, but curious about the reasoning.

Thanks!
Jim


Re: CREATE collection bug or feature?

2015-06-19 Thread Jim . Musil
Thanks as always for the great answers!

Jim


On 6/19/15, 11:57 AM, Erick Erickson erickerick...@gmail.com wrote:

Jim:

This is by design. There's no way to tell Solr to find all the cores
available and put one replica on each. In fact, you're explicitly
telling it to create one and only one replica, one and only one shard.
That is, your collection will have exactly one low-level core. But you
realized that...

As to the reasoning. Consider hetergeneous collections all hosted on
the same Solr cluster. I have big collections, little collections,
some with high QPS rates, some not. etc. Having Solr do things like
this automatically would make managing this difficult.

Probably the real reason is nobody thought it would be useful in
the general case. And I probably concur. Adding a new node to an
existing cluster would result in unbalanced clusters etc.

I suppose a stop-gap would be to query the live_nodes in the cluster
and add that to the URL, don't know how much of a pain that would be
though.

Best,
Erick

On Fri, Jun 19, 2015 at 10:15 AM, Jim.Musil jim.mu...@target.com wrote:
 I noticed that when I issue the CREATE collection command to the api,
it does not automatically put a replica on every live node connected to
zookeeper.

 So, for example, if I have 3 solr nodes connected to a zookeeper
ensemble and create a collection like this:

 
/admin/collections?action=CREATEname=my_collectionnumShards=1replicati
onFactor=1maxShardsPerNode=1collection.configName=my_config

 It will only create a core on one of the three nodes. I can make it
work if I change replicationFactor to 3. When standing up an entire
stack using chef, this all gets a bit clunky. I don't see any option
such as ALL that would just create a replica on all nodes regardless
of size.

 I'm guessing this is intentional, but curious about the reasoning.

 Thanks!
 Jim



Collections API and adding new boxes

2015-06-18 Thread Jim . Musil
Hi,

Let's say I have a zookeeper ensemble with several Solr nodes connected to it. 
I've created a collection successfully and all is well.

What happens when I want to add another solr node?

I've tried spinning one up and connecting it to zookeeper, but the new node 
doesn't join the collection.  What's the expected next step?

This is Solr 5.1.

Thanks!
Jim Musil


Re: Clarification on Collections API for 5.x

2015-05-27 Thread Jim . Musil
bump

On 5/21/15, 9:06 AM, Jim.Musil jim.mu...@target.com wrote:

Hi,

In the guide for moving from Solr 4.x to 5.x, it states the following:

Solr 5.0 only supports creating and removing SolrCloud collections
through the Collections
APIhttps://cwiki.apache.org/confluence/display/solr/Collections+API,
unlike previous versions. While not using the collections API may still
work in 5.0, it is unsupported, not recommended, and the behavior will
change in a 5.x release.

Currently, we launch several solr nodes with identical cores defined
using the new Core Discovery process. These nodes are also connected to a
zookeeper ensemble. Part of the core definition is to set the configSet
to use. This configSet is uploaded to zookeeper separately. This
effectively creates a Collection.

Is this method no long supported in 5.x?

Thanks!
Jim Musil




Re: Clarification on Collections API for 5.x

2015-05-27 Thread Jim . Musil
Thanks for the clarification!

On 5/27/15, 12:00 PM, Erick Erickson erickerick...@gmail.com wrote:

Are you defining shard and replicas here? Or is this just a
single-node collection? In any case, this seems unnecessary. You'd get
the same thing by having your uploading the config set to ZK, then
just issuing a Collections CREATE command, specifying the node to use
if desired.

What you're doing _should_ work, because essentially that's what start
up does. It finds cores somewhere below SOLR_HOME and reads the
core.properties file. When it finds parameters like collection, shard,
coreNodeName, numShards, all that stuff it figures things out. But,
you have to get all this right manually with the process you're using
now, why take the risk? Besides, in the future you'll have to adapt to
any back-compat breaks...

Best,
Erick

On Wed, May 27, 2015 at 8:34 AM, Jim.Musil jim.mu...@target.com wrote:
 bump

 On 5/21/15, 9:06 AM, Jim.Musil jim.mu...@target.com wrote:

Hi,

In the guide for moving from Solr 4.x to 5.x, it states the following:

Solr 5.0 only supports creating and removing SolrCloud collections
through the Collections
APIhttps://cwiki.apache.org/confluence/display/solr/Collections+API,
unlike previous versions. While not using the collections API may still
work in 5.0, it is unsupported, not recommended, and the behavior will
change in a 5.x release.

Currently, we launch several solr nodes with identical cores defined
using the new Core Discovery process. These nodes are also connected to
a
zookeeper ensemble. Part of the core definition is to set the configSet
to use. This configSet is uploaded to zookeeper separately. This
effectively creates a Collection.

Is this method no long supported in 5.x?

Thanks!
Jim Musil





Clarification on Collections API for 5.x

2015-05-21 Thread Jim . Musil
Hi,

In the guide for moving from Solr 4.x to 5.x, it states the following:

Solr 5.0 only supports creating and removing SolrCloud collections through the 
Collections 
APIhttps://cwiki.apache.org/confluence/display/solr/Collections+API, unlike 
previous versions. While not using the collections API may still work in 5.0, 
it is unsupported, not recommended, and the behavior will change in a 5.x 
release.

Currently, we launch several solr nodes with identical cores defined using the 
new Core Discovery process. These nodes are also connected to a zookeeper 
ensemble. Part of the core definition is to set the configSet to use. This 
configSet is uploaded to zookeeper separately. This effectively creates a 
Collection.

Is this method no long supported in 5.x?

Thanks!
Jim Musil



ConfigSets and SolrCloud

2015-05-20 Thread Jim . Musil
Hi,

I need a little clarification on configSets in solr 5.x.

According to this page:

https://cwiki.apache.org/confluence/display/solr/Config+Sets

I can create named configSets to be shared by other cores. If I create them 
using this method AND am operating in SolrCloud mode, will it automatically 
upload these named config sets to zookeeper?

Thanks!
Jim Musil


Confusion about zkcli.sh and solr.war

2015-05-13 Thread Jim . Musil
I'm trying to use zkcli.sh to upload configurations to zookeeper and solr 5.1.

It's throwing an error because it references webapps/solr.war which no longer 
exists.

Do I have to build my own solr.war in order to use zkcli.sh?

Please forgive me if I'm missing something here.

Jim Musil


Re: Solr returns incorrect results after sorting

2015-03-19 Thread jim ferenczi
Then you just have to remove the group.sort especially if your group limit
is set to 1.
Le 19 mars 2015 16:57, kumarraj rajitpro2...@gmail.com a écrit :

 *if the number of documents in one group is more than one then you cannot
 ensure that this document reflects the main sort

 Is there a way the top record which is coming up in the group is considered
 for sorting?
 We require to show the record from 212(even though price is low) in both
 the
 cases of high to low or low to high..and still the main sorting should
 work?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4194008.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr returns incorrect results after sorting

2015-03-18 Thread jim ferenczi
Hi Raj,
The group.sort you are using defines multiple criterias. The first criteria
is the big solr function starting with the max. This means that inside
each group the documents will be sorted by this criteria and if the values
are equals between two documents then the comparison fallbacks to the
second criteria (inStock_boolean desc) and so on.

*Even though if i add price asc in the group.sort, but still the main
sort does not consider that.*
The main sort does not have to consider what's in the group.sort. The
group.sort defines the way the documents are sorted inside each group. So
if you want to sort the document inside each group with the same order than
in the main sort you can remove the group.sort or you can have a primary
sort on pricecommon_double desc in your group.sort:
*group.sort=pricecommon_double
desc, 
max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0))
desc,inStock_boolean
desc,geodist() asc*


Cheers,
Jim



2015-03-18 7:28 GMT+01:00 kumarraj rajitpro2...@gmail.com:

 Hi Jim,

 Yes, you are right.. that document is having price 499.99,
 But i want to consider the first record in the group as part of the main
 sort.
 Even though if i add price asc in the group.sort, but still the main sort
 does not consider that.

 group.sort=max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0))
 desc,inStock_boolean descgeodist() asc,pricecommon_double
 ascsort=pricecommon_double desc

 Is there any other workaround so that sort is always based on the first
 record which is pulled up in each group?


 Regards,
 Raj



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4193658.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr returns incorrect results after sorting

2015-03-17 Thread jim ferenczi
Hi,
Please note that you have two sort criteria, one to sort the documents
inside each group and one to sort the groups. In the example you sent, the
group 10002 has two documents and your group.limit is set to 1. If you redo
the query with group.limit=2 I suspect that you'll see the second document
of this group with a pricecommon_double between 479.99 and 729.97. This
would mean that the sorting is correct ;). Bottom line is that when you
have a group.sort different than the main sort, if the number of documents
in one group is more than one then you cannot ensure that this document
reflects the main sort. Try for instance group.sort=pricecommon_double
asc (main sort inverse order) and you'll see that the sort inside each
group is always applied after the main sort. This is the only way to meet
the expectations ;).

Cheers,
Jim



2015-03-17 9:48 GMT+01:00 kumarraj rajitpro2...@gmail.com:

 Thanks David, that was a typo.
 Do you see any other issues? While solr does the grouping and if more than
 one document which are matched with given group.sort condition(numfound=2),
 then that particular document is not sorted correctly, when sorted by
 price.(sort=price) is applied across all the groups.

 Example: Below is the sample result.

  arr name=groups
   lst
 str name=groupValue10001/str
 result name=doclist numFound=1 start=0
   doc
 double name=pricecommon_double729.97/double
 str name=code_string10001/str
 str name=name_textProduct1/str
 str name=storeName_string203/str
 double name=geodist()198.70324062133778/double/doc
 /result
   /lst
   lst
 str name=groupValue10002/str
 result name=doclist numFound=2 start=0
   doc
 double name=pricecommon_double279.99/double
 str name=code_string10002/str
 str name=name_textProduct2/str
 str name=storeName_string212/str
 double name=geodist()0.0/double/doc
 /result
   /lst
   lst
 str name=groupValue10003/str
 result name=doclist numFound=1 start=0
   doc
 double name=pricecommon_double479.99/double
 str name=code_string10003/str
 str name=name_textProduct3/str
 str name=storeName_string203/str
 double name=geodist()198.70324062133778/double/doc
 /result
   /lst

 I expect product 10002, to be sorted and shown after 1003, but it is not
 sorted correctly.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4193457.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Possible to dump clusterstate, system stats into solr log?

2015-02-11 Thread Jim . Musil
Hi,

Is it possible to periodically dump the cluster state contents (or system 
diagnostics) into the main solr log file?

We have many security protocols in place that prevents us from running 
diagnostic requests directly to the solr boxes, but we do have access to the 
shipped logs.

Thanks!
Jim


Re: Where can we set the parameters in Solr Config?

2015-02-03 Thread Jim . Musil
We set them as extra parameters sent to to the servlet (jetty or tomcat).

eg java -Dsolr.lock.type=native -jar start.jar

Jim

On 2/3/15, 11:58 AM, O. Olson olson_...@yahoo.it wrote:

I'm sorry if this is a basic question, but I am curious where, or at
least,
how can we set the parameters in the solrconfig.xml.

E.g. Consider the solrconfig.xml shown here:
http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/exa
mple/example-DIH/solr/db/conf/solrconfig.xml?revision=1638496view=markup

There seems be a lot of
${ParameterName:Value}
E.g. 
lockType${solr.lock.type:native}/lockType

Where do these parameter values get set? Thank you in anticipation.




--
View this message in context:
http://lucene.472066.n3.nabble.com/Where-can-we-set-the-parameters-in-Solr
-Config-tp4183706.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR retrieve data using URL

2015-02-02 Thread Jim . Musil
You don't have to use SolrJ. It's just a web request to a url, so just
issue the request in Java and parse the JSON response.

http://stackoverflow.com/questions/7467568/parsing-json-from-url

SolrJ does make it simpler, however.

Jim

On 2/2/15, 12:57 PM, mathewvino vinojmat...@hotmail.com wrote:

Hi There,

I am using solrj API to make call to Solr Server with the data that I am
looking for. Basically I am using
solrj api as below to get the data. Everything is working as expected

HttpSolrServer solr = new
HttpSolrServer(http://server:8983/solr/collection1;);
SolrQuery query = new SolrQuery(*:*);
query.setFacet(true).addFacetField(PLS_SURVY_SURVY_STATUS_MAP)

Is there any API I can use the complete URL to get the data like below

HttpSolrServer solr = new
HttpSolrServer(http://server:8983/solr/collection1/select?q=*%3A*wt=json
indent=truefacet=truefacet.field=PLS_SURVY_SURVY_LANG_CHOICE_MAP)

I would like to pass the complete url to get the data insted of using
solrj
query api.

Thanks






--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-retrieve-data-using-URL-tp4183536.
html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr throwing SocketException: Connection Reset

2015-02-02 Thread Jim . Musil
This is difficult to diagnose, but here¹s some questions I would ask
myself:

Can you reliably recreate the error?
Can you recreate the error faster by writing to all 100 collections at
once?
Can you recreate the error faster if I have less nodes?

Is just one solr node or one solr collection throwing the error?

Are all the updates coming from one machine?
Is there some other bottleneck in your network (like a load balancer) that
is limiting connections?

Good luck,
Jim Musil


On 2/2/15, 5:29 AM, nkgupta nitinkumargu...@gmail.com wrote:

I have 8 node solr cloud cluster connected with external zookeeper. Each
node
: 30 Gb, 4 core.
I have created around 100 collections, each collection is having approx.
30
shards. (Why I need it, let be a different story, business isolation,
business requirement could be anything).

Now, I am ingesting data into cluster on 30 collections simultaneously. I
see that ingestion to few collections is getting failed. In solr logs, I
can
see this Connection Reset exception occurring. Overall time for
ingestion
is in the tune of 10 hours.

Any suggestion? Even if it is due to resource starvation how can I prove
that connection reset is coming because of lack of resources.

 Exception ==
2015-01-30 09:16:14,454 ERROR [updateExecutor-1-thread-8151] ? (:) - error
java.net.SocketException: Connection reset
   at java.net.SocketInputStream.read(SocketInputStream.java:196)
~[?:1.7.0_55]
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
~[?:1.7.0_55]
   at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSess
ionInputBuffer.java:160)
~[httpcore-4.3.jar:4.3]
   at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.jav
a:84)
~[httpcore-4.3.jar:4.3]
   at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessio
nInputBuffer.java:273)
~[httpcore-4.3.jar:4.3]
   at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpR
esponseParser.java:140)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpR
esponseParser.java:57)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.
java:260)
~[httpcore-4.3.jar:4.3]
   at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(Ab
stractHttpClientConnection.java:283)
~[httpcore-4.3.jar:4.3]
   at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(De
faultClientConnection.java:251)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeade
r(ManagedClientConnectionImpl.java:197)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequest
Executor.java:271)
~[httpcore-4.3.jar:4.3]
   at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.j
ava:123)
~[httpcore-4.3.jar:4.3]
   at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReque
stDirector.java:682)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestD
irector.java:486)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClien
t.java:863)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClien
t.java:82)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClien
t.java:106)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClien
t.java:57)
~[httpclient-4.3.1.jar:4.3.1]
   at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(Co
ncurrentUpdateSolrServer.java:233)
[solr-solrj-4.10.0.jar:4.10.0 1620776 - rjernst - 2014-08-26 20:49:51]
   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1145)
[?:1.7.0_55]
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
:615)
[?:1.7.0_55]
   at java.lang.Thread.run(Thread.java:745) [?:1.7.0_55]



--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-throwing-SocketException-Connectio
n-Reset-tp4183434.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr pattern tokenizer

2015-02-02 Thread Jim . Musil
It looks to me like you simply want to split the incoming query by the
hyphen, so that it searches for exact codes like this ³CHQ PAID² ³INWARD
TRAN² ³HDFC LTD². 

If that¹s true, I¹d either just change the query at the client to do what
you want, or look into something like the PatternTokenizer:

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternTo
kenizerFactory


Apologies if I¹m not understanding your use case.

Thanks,
Jim

On 2/2/15, 3:56 AM, Nivedita nivedita.pa...@tcs.com wrote:

Hi,

I want to tokenize query like CHQ PAID-INWARD TRAN-HDFC LTD  in such a
way
that it should give me result documnet containing HDFC LTD and not HDFC
MF. 

How can I do this.
I Have already applied below Tokenizers

 fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
   
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /

filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
   
   filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=25 side=front/
filter class=solr.LowerCaseFilterFactory/
   filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
filter class=solr.TrimFilterFactory /
  /analyzer
/fieldType


Please help.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-pattern-tokenizer-tp4183421.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: An interesting approach to grouping

2015-01-27 Thread Jim . Musil
Yes, I’m trying to pin down exactly what conditions cause the bug to
appear. It seems as though it’s only when using the query function.

Jim

On 1/27/15, 12:44 PM, Ryan Josal rjo...@gmail.com wrote:

This is great, thanks Jim.  Your patch worked and the sorting solution
meets the goal, although group.limit seems like it could cut various
results out of the middle of the result set.  I will play around with it
and see if it proves helpful.  Can you let me know the Jira so I can keep
an eye on it?

Ryan

On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote:

 Interestingly, you can do something like this:

 group=true
 group.main=true
 group.func=rint(scale(query({!type=edismax v=$q}),0,20)) // puts into
 buckets
 group.limit=20 // gives you 20 from each bucket
 group.sort=category asc  // this will sort by category within each
bucket,
 but this can be a function as well.



 Jim Musil



 On 1/27/15, 10:14 AM, Jim.Musil jim.mu...@target.com javascript:;
 wrote:

 When using group.main=true, the results are not mixed as you expect:
 
 If true, the result of the last field grouping command is used as the
 main result list in the response, using group.format=simple”
 
 https://wiki.apache.org/solr/FieldCollapsing
 
 
 Jim
 
 On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com javascript:;
 wrote:
 
 Thanks a lot!  I'll try this out later this morning.  If group.func
and
 group.field don't combine the way I think they might, I'll try to look
 for
 a way to put it all in group.func.
 
 On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com
 javascript:; wrote:
 
  I¹m not sure the query you provided will do what you want, BUT I did
 find
  the bug in the code that is causing the NullPointerException.
 
  The variable context is supposed to be global, but when prepare() is
  called, it is only defined in the scope of that function.
 
  Here¹s the simple patch:
 
  Index: core/src/java/org/apache/solr/search/Grouping.java
  ===
  --- core/src/java/org/apache/solr/search/Grouping.java  (revision
 1653358)
  +++ core/src/java/org/apache/solr/search/Grouping.java  (working
copy)
  @@ -926,7 +926,7 @@
*/
   @Override
   protected void prepare() throws IOException {
  -  Map context = ValueSource.newContext(searcher);
  +  context = ValueSource.newContext(searcher);
 groupBy.createWeight(context, searcher);
 actualGroupsToFind = getMax(offset, numGroups, maxDoc);
   }
 
 
  I¹ll search for a Jira issue and open if I can¹t find one.
 
  Jim Musil
 
 
 
  On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:;
 javascript:;
 wrote:
 
  I have an index of products, and these products have a category
 which we
  can say for now is a good approximation of its location in the
store.
 I'm
  investigating altering the ordering of the results so that the
 categories
  aren't interlaced as much... so that the results are a little bit
more
  grouped by category, but not *totally* grouped by category.  It's
  interesting because it's an approach that sort of compares results
to
  near-scored/ranked results.  One of the hoped outcomes of this
would
 that
  there would be somewhat fewer categories represented in the top
 results
  for
  a given query, although it is questionable if this is a good
 measurement
  to
  determine the effectiveness of the implementation.
  
  My first attempt was to
 
 
group=truegroup.main=truegroup.field=categorygroup.func=rint(scale
(q
 u
 er
  y({!type=edismax
  v=$q}),0,20))
  
  Or some FunctionQuery like that, so that in order to become a
member
 of a
  group, the doc would have to have the same category, and be dropped
 into
  the same score bucket (20 in this case).  This doesn't work out of
the
  gate
  due to an NPE (solr 4.10.2) (although I'm not sure it would work
 anyway):
  
  java.lang.NullPointerException\n\tat
 
 
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get
Va
 l
 ue
  s(ScaleFloatFunction.java:104)\n\tat
 
 
org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar
se
 r
 .j
  ava:)\n\tat
 
 
org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC
ol
 l
 ec
  tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
 
 
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.
ja
 v
 a:
  113)\n\tat
 
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
\n
 \
 ta
  t
 
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
\n
 \
 ta
  t
 
 
org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4
51
 )
 \n
  \tat
  org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
 
 
org.apache.solr.handler.component.QueryComponent.process(QueryCompone
nt
 .
 ja
  va:459)\n\tat
 
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
rc
 h
 Ha
  ndler.java:218)\n\tat
  
  
  Has anyone tried something like

Re: An interesting approach to grouping

2015-01-27 Thread Jim . Musil
Here’s the issue:

https://issues.apache.org/jira/browse/SOLR-7046


Jim

On 1/27/15, 12:44 PM, Ryan Josal rjo...@gmail.com wrote:

This is great, thanks Jim.  Your patch worked and the sorting solution
meets the goal, although group.limit seems like it could cut various
results out of the middle of the result set.  I will play around with it
and see if it proves helpful.  Can you let me know the Jira so I can keep
an eye on it?

Ryan

On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote:

 Interestingly, you can do something like this:

 group=true
 group.main=true
 group.func=rint(scale(query({!type=edismax v=$q}),0,20)) // puts into
 buckets
 group.limit=20 // gives you 20 from each bucket
 group.sort=category asc  // this will sort by category within each
bucket,
 but this can be a function as well.



 Jim Musil



 On 1/27/15, 10:14 AM, Jim.Musil jim.mu...@target.com javascript:;
 wrote:

 When using group.main=true, the results are not mixed as you expect:
 
 If true, the result of the last field grouping command is used as the
 main result list in the response, using group.format=simple”
 
 https://wiki.apache.org/solr/FieldCollapsing
 
 
 Jim
 
 On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com javascript:;
 wrote:
 
 Thanks a lot!  I'll try this out later this morning.  If group.func
and
 group.field don't combine the way I think they might, I'll try to look
 for
 a way to put it all in group.func.
 
 On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com
 javascript:; wrote:
 
  I¹m not sure the query you provided will do what you want, BUT I did
 find
  the bug in the code that is causing the NullPointerException.
 
  The variable context is supposed to be global, but when prepare() is
  called, it is only defined in the scope of that function.
 
  Here¹s the simple patch:
 
  Index: core/src/java/org/apache/solr/search/Grouping.java
  ===
  --- core/src/java/org/apache/solr/search/Grouping.java  (revision
 1653358)
  +++ core/src/java/org/apache/solr/search/Grouping.java  (working
copy)
  @@ -926,7 +926,7 @@
*/
   @Override
   protected void prepare() throws IOException {
  -  Map context = ValueSource.newContext(searcher);
  +  context = ValueSource.newContext(searcher);
 groupBy.createWeight(context, searcher);
 actualGroupsToFind = getMax(offset, numGroups, maxDoc);
   }
 
 
  I¹ll search for a Jira issue and open if I can¹t find one.
 
  Jim Musil
 
 
 
  On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:;
 javascript:;
 wrote:
 
  I have an index of products, and these products have a category
 which we
  can say for now is a good approximation of its location in the
store.
 I'm
  investigating altering the ordering of the results so that the
 categories
  aren't interlaced as much... so that the results are a little bit
more
  grouped by category, but not *totally* grouped by category.  It's
  interesting because it's an approach that sort of compares results
to
  near-scored/ranked results.  One of the hoped outcomes of this
would
 that
  there would be somewhat fewer categories represented in the top
 results
  for
  a given query, although it is questionable if this is a good
 measurement
  to
  determine the effectiveness of the implementation.
  
  My first attempt was to
 
 
group=truegroup.main=truegroup.field=categorygroup.func=rint(scale
(q
 u
 er
  y({!type=edismax
  v=$q}),0,20))
  
  Or some FunctionQuery like that, so that in order to become a
member
 of a
  group, the doc would have to have the same category, and be dropped
 into
  the same score bucket (20 in this case).  This doesn't work out of
the
  gate
  due to an NPE (solr 4.10.2) (although I'm not sure it would work
 anyway):
  
  java.lang.NullPointerException\n\tat
 
 
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get
Va
 l
 ue
  s(ScaleFloatFunction.java:104)\n\tat
 
 
org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar
se
 r
 .j
  ava:)\n\tat
 
 
org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC
ol
 l
 ec
  tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
 
 
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.
ja
 v
 a:
  113)\n\tat
 
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
\n
 \
 ta
  t
 
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
\n
 \
 ta
  t
 
 
org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4
51
 )
 \n
  \tat
  org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
 
 
org.apache.solr.handler.component.QueryComponent.process(QueryCompone
nt
 .
 ja
  va:459)\n\tat
 
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
rc
 h
 Ha
  ndler.java:218)\n\tat
  
  
  Has anyone tried something like this before, and does anyone have
any
  novel
  ideas for how to approach

Re: An interesting approach to grouping

2015-01-27 Thread Jim . Musil
Here’s the issue:


On 1/27/15, 12:44 PM, Ryan Josal rjo...@gmail.com wrote:

This is great, thanks Jim.  Your patch worked and the sorting solution
meets the goal, although group.limit seems like it could cut various
results out of the middle of the result set.  I will play around with it
and see if it proves helpful.  Can you let me know the Jira so I can keep
an eye on it?

Ryan

On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote:

 Interestingly, you can do something like this:

 group=true
 group.main=true
 group.func=rint(scale(query({!type=edismax v=$q}),0,20)) // puts into
 buckets
 group.limit=20 // gives you 20 from each bucket
 group.sort=category asc  // this will sort by category within each
bucket,
 but this can be a function as well.



 Jim Musil



 On 1/27/15, 10:14 AM, Jim.Musil jim.mu...@target.com javascript:;
 wrote:

 When using group.main=true, the results are not mixed as you expect:
 
 If true, the result of the last field grouping command is used as the
 main result list in the response, using group.format=simple”
 
 https://wiki.apache.org/solr/FieldCollapsing
 
 
 Jim
 
 On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com javascript:;
 wrote:
 
 Thanks a lot!  I'll try this out later this morning.  If group.func
and
 group.field don't combine the way I think they might, I'll try to look
 for
 a way to put it all in group.func.
 
 On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com
 javascript:; wrote:
 
  I¹m not sure the query you provided will do what you want, BUT I did
 find
  the bug in the code that is causing the NullPointerException.
 
  The variable context is supposed to be global, but when prepare() is
  called, it is only defined in the scope of that function.
 
  Here¹s the simple patch:
 
  Index: core/src/java/org/apache/solr/search/Grouping.java
  ===
  --- core/src/java/org/apache/solr/search/Grouping.java  (revision
 1653358)
  +++ core/src/java/org/apache/solr/search/Grouping.java  (working
copy)
  @@ -926,7 +926,7 @@
*/
   @Override
   protected void prepare() throws IOException {
  -  Map context = ValueSource.newContext(searcher);
  +  context = ValueSource.newContext(searcher);
 groupBy.createWeight(context, searcher);
 actualGroupsToFind = getMax(offset, numGroups, maxDoc);
   }
 
 
  I¹ll search for a Jira issue and open if I can¹t find one.
 
  Jim Musil
 
 
 
  On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:;
 javascript:;
 wrote:
 
  I have an index of products, and these products have a category
 which we
  can say for now is a good approximation of its location in the
store.
 I'm
  investigating altering the ordering of the results so that the
 categories
  aren't interlaced as much... so that the results are a little bit
more
  grouped by category, but not *totally* grouped by category.  It's
  interesting because it's an approach that sort of compares results
to
  near-scored/ranked results.  One of the hoped outcomes of this
would
 that
  there would be somewhat fewer categories represented in the top
 results
  for
  a given query, although it is questionable if this is a good
 measurement
  to
  determine the effectiveness of the implementation.
  
  My first attempt was to
 
 
group=truegroup.main=truegroup.field=categorygroup.func=rint(scale
(q
 u
 er
  y({!type=edismax
  v=$q}),0,20))
  
  Or some FunctionQuery like that, so that in order to become a
member
 of a
  group, the doc would have to have the same category, and be dropped
 into
  the same score bucket (20 in this case).  This doesn't work out of
the
  gate
  due to an NPE (solr 4.10.2) (although I'm not sure it would work
 anyway):
  
  java.lang.NullPointerException\n\tat
 
 
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get
Va
 l
 ue
  s(ScaleFloatFunction.java:104)\n\tat
 
 
org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar
se
 r
 .j
  ava:)\n\tat
 
 
org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC
ol
 l
 ec
  tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
 
 
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.
ja
 v
 a:
  113)\n\tat
 
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
\n
 \
 ta
  t
 
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
\n
 \
 ta
  t
 
 
org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4
51
 )
 \n
  \tat
  org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
 
 
org.apache.solr.handler.component.QueryComponent.process(QueryCompone
nt
 .
 ja
  va:459)\n\tat
 
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
rc
 h
 Ha
  ndler.java:218)\n\tat
  
  
  Has anyone tried something like this before, and does anyone have
any
  novel
  ideas for how to approach it, no matter how different?  How about a
  workaround

Re: An interesting approach to grouping

2015-01-27 Thread Jim . Musil
I¹m not sure the query you provided will do what you want, BUT I did find
the bug in the code that is causing the NullPointerException.

The variable context is supposed to be global, but when prepare() is
called, it is only defined in the scope of that function.

Here¹s the simple patch:

Index: core/src/java/org/apache/solr/search/Grouping.java
===
--- core/src/java/org/apache/solr/search/Grouping.java  (revision 1653358)
+++ core/src/java/org/apache/solr/search/Grouping.java  (working copy)
@@ -926,7 +926,7 @@
  */
 @Override
 protected void prepare() throws IOException {
-  Map context = ValueSource.newContext(searcher);
+  context = ValueSource.newContext(searcher);
   groupBy.createWeight(context, searcher);
   actualGroupsToFind = getMax(offset, numGroups, maxDoc);
 }


I¹ll search for a Jira issue and open if I can¹t find one.

Jim Musil



On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com wrote:

I have an index of products, and these products have a category which we
can say for now is a good approximation of its location in the store.  I'm
investigating altering the ordering of the results so that the categories
aren't interlaced as much... so that the results are a little bit more
grouped by category, but not *totally* grouped by category.  It's
interesting because it's an approach that sort of compares results to
near-scored/ranked results.  One of the hoped outcomes of this would that
there would be somewhat fewer categories represented in the top results
for
a given query, although it is questionable if this is a good measurement
to
determine the effectiveness of the implementation.

My first attempt was to
group=truegroup.main=truegroup.field=categorygroup.func=rint(scale(quer
y({!type=edismax
v=$q}),0,20))

Or some FunctionQuery like that, so that in order to become a member of a
group, the doc would have to have the same category, and be dropped into
the same score bucket (20 in this case).  This doesn't work out of the
gate
due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway):

java.lang.NullPointerException\n\tat
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getValue
s(ScaleFloatFunction.java:104)\n\tat
org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser.j
ava:)\n\tat
org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCollec
tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java:
113)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\ta
t
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\ta
t
org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451)\n
\tat
org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.ja
va:459)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa
ndler.java:218)\n\tat


Has anyone tried something like this before, and does anyone have any
novel
ideas for how to approach it, no matter how different?  How about a
workaround for the group.func error here?  I'm very open-minded about
where
to go on this one.

Thanks,
Ryan



Re: An interesting approach to grouping

2015-01-27 Thread Jim . Musil
When using group.main=true, the results are not mixed as you expect:

If true, the result of the last field grouping command is used as the
main result list in the response, using group.format=simple”

https://wiki.apache.org/solr/FieldCollapsing


Jim

On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com wrote:

Thanks a lot!  I'll try this out later this morning.  If group.func and
group.field don't combine the way I think they might, I'll try to look for
a way to put it all in group.func.

On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote:

 I¹m not sure the query you provided will do what you want, BUT I did
find
 the bug in the code that is causing the NullPointerException.

 The variable context is supposed to be global, but when prepare() is
 called, it is only defined in the scope of that function.

 Here¹s the simple patch:

 Index: core/src/java/org/apache/solr/search/Grouping.java
 ===
 --- core/src/java/org/apache/solr/search/Grouping.java  (revision
1653358)
 +++ core/src/java/org/apache/solr/search/Grouping.java  (working copy)
 @@ -926,7 +926,7 @@
   */
  @Override
  protected void prepare() throws IOException {
 -  Map context = ValueSource.newContext(searcher);
 +  context = ValueSource.newContext(searcher);
groupBy.createWeight(context, searcher);
actualGroupsToFind = getMax(offset, numGroups, maxDoc);
  }


 I¹ll search for a Jira issue and open if I can¹t find one.

 Jim Musil



 On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:; wrote:

 I have an index of products, and these products have a category
which we
 can say for now is a good approximation of its location in the store.
I'm
 investigating altering the ordering of the results so that the
categories
 aren't interlaced as much... so that the results are a little bit more
 grouped by category, but not *totally* grouped by category.  It's
 interesting because it's an approach that sort of compares results to
 near-scored/ranked results.  One of the hoped outcomes of this would
that
 there would be somewhat fewer categories represented in the top results
 for
 a given query, although it is questionable if this is a good
measurement
 to
 determine the effectiveness of the implementation.
 
 My first attempt was to
 
group=truegroup.main=truegroup.field=categorygroup.func=rint(scale(qu
er
 y({!type=edismax
 v=$q}),0,20))
 
 Or some FunctionQuery like that, so that in order to become a member
of a
 group, the doc would have to have the same category, and be dropped
into
 the same score bucket (20 in this case).  This doesn't work out of the
 gate
 due to an NPE (solr 4.10.2) (although I'm not sure it would work
anyway):
 
 java.lang.NullPointerException\n\tat
 
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getVal
ue
 s(ScaleFloatFunction.java:104)\n\tat
 
org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser
.j
 ava:)\n\tat
 
org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingColl
ec
 tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
 
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.jav
a:
 113)\n\tat
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\
ta
 t
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\
ta
 t
 
org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451)
\n
 \tat
 org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.
ja
 va:459)\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
Ha
 ndler.java:218)\n\tat
 
 
 Has anyone tried something like this before, and does anyone have any
 novel
 ideas for how to approach it, no matter how different?  How about a
 workaround for the group.func error here?  I'm very open-minded about
 where
 to go on this one.
 
 Thanks,
 Ryan





Re: An interesting approach to grouping

2015-01-27 Thread Jim . Musil
Interestingly, you can do something like this:

group=true
group.main=true
group.func=rint(scale(query({!type=edismax v=$q}),0,20)) // puts into
buckets
group.limit=20 // gives you 20 from each bucket
group.sort=category asc  // this will sort by category within each bucket,
but this can be a function as well.



Jim Musil



On 1/27/15, 10:14 AM, Jim.Musil jim.mu...@target.com wrote:

When using group.main=true, the results are not mixed as you expect:

If true, the result of the last field grouping command is used as the
main result list in the response, using group.format=simple”

https://wiki.apache.org/solr/FieldCollapsing


Jim

On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com wrote:

Thanks a lot!  I'll try this out later this morning.  If group.func and
group.field don't combine the way I think they might, I'll try to look
for
a way to put it all in group.func.

On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote:

 I¹m not sure the query you provided will do what you want, BUT I did
find
 the bug in the code that is causing the NullPointerException.

 The variable context is supposed to be global, but when prepare() is
 called, it is only defined in the scope of that function.

 Here¹s the simple patch:

 Index: core/src/java/org/apache/solr/search/Grouping.java
 ===
 --- core/src/java/org/apache/solr/search/Grouping.java  (revision
1653358)
 +++ core/src/java/org/apache/solr/search/Grouping.java  (working copy)
 @@ -926,7 +926,7 @@
   */
  @Override
  protected void prepare() throws IOException {
 -  Map context = ValueSource.newContext(searcher);
 +  context = ValueSource.newContext(searcher);
groupBy.createWeight(context, searcher);
actualGroupsToFind = getMax(offset, numGroups, maxDoc);
  }


 I¹ll search for a Jira issue and open if I can¹t find one.

 Jim Musil



 On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:;
wrote:

 I have an index of products, and these products have a category
which we
 can say for now is a good approximation of its location in the store.
I'm
 investigating altering the ordering of the results so that the
categories
 aren't interlaced as much... so that the results are a little bit more
 grouped by category, but not *totally* grouped by category.  It's
 interesting because it's an approach that sort of compares results to
 near-scored/ranked results.  One of the hoped outcomes of this would
that
 there would be somewhat fewer categories represented in the top
results
 for
 a given query, although it is questionable if this is a good
measurement
 to
 determine the effectiveness of the implementation.
 
 My first attempt was to
 
group=truegroup.main=truegroup.field=categorygroup.func=rint(scale(q
u
er
 y({!type=edismax
 v=$q}),0,20))
 
 Or some FunctionQuery like that, so that in order to become a member
of a
 group, the doc would have to have the same category, and be dropped
into
 the same score bucket (20 in this case).  This doesn't work out of the
 gate
 due to an NPE (solr 4.10.2) (although I'm not sure it would work
anyway):
 
 java.lang.NullPointerException\n\tat
 
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getVa
l
ue
 s(ScaleFloatFunction.java:104)\n\tat
 
org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParse
r
.j
 ava:)\n\tat
 
org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCol
l
ec
 tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
 
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.ja
v
a:
 113)\n\tat
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n
\
ta
 t
 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n
\
ta
 t
 
org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451
)
\n
 \tat
 org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent
.
ja
 va:459)\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Searc
h
Ha
 ndler.java:218)\n\tat
 
 
 Has anyone tried something like this before, and does anyone have any
 novel
 ideas for how to approach it, no matter how different?  How about a
 workaround for the group.func error here?  I'm very open-minded about
 where
 to go on this one.
 
 Thanks,
 Ryan






Re: Indexed epoch time in Solr

2015-01-26 Thread Jim . Musil
If you are using the DataImportHandler, you can leverage on of the
transformers, such as the DateFormatTransformer:

http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer


If you are updating documents directly you can define a regex
transformation in your schema.xml:

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternRe
placeCharFilterFactory


If you have control over the input, then I always find it better to just
transform it prior to sending it into solr.

Jim

On 1/25/15, 11:35 PM, Ahmed Adel ahmed.a...@badrit.com wrote:

Hi All,

Is there a way to convert unix time field that is already indexed to
ISO-8601 format in query response? If this is not possible on the query
level, what is the best way to copy this field to a new Solr standard date
field.

Thanks,

-- 
*Ahmed Adel*
http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2F



Re: Filter cache pollution during sharded edismax queries

2014-10-01 Thread jim ferenczi
I think you should test with facet.shard.limit=-1 this will disallow the
limit for the facet on the shards and remove the needs for facet
refinements. I bet that returning every facet with a count greater than 0
on internal queries is cheaper than using the filter cache to handle a lot
of refinements.

Jim

2014-10-01 10:24 GMT+02:00 Charlie Hull char...@flax.co.uk:

 On 30/09/2014 22:25, Erick Erickson wrote:

 Just from a 20,000 ft. view, using the filterCache this way seems...odd.

 +1 for using a different cache, but that's being quite unfamiliar with the
 code.


 Here's a quick update:

 1. LFUCache performs worse so we returned to LRUCache
 2. Making the cache smaller than the default 512 reduced performance.
 3. Raising the cache size to 2048 didn't seem to have a significant effect
 on performance but did reduce CPU load significantly. This may help our
 client as they can reduce their system spec considerably.

 We're continuing to test with our client, but the upshot is that even if
 you think you don't need the filter cache, if you're doing distributed
 faceting you probably do, and you should size it based on experimentation.
 In our case there is a single filter but the cache needs to be considerably
 larger than that!

 Cheers

 Charlie



 On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward a...@flax.co.uk wrote:



  Once all the facets have been gathered, the co-ordinating node then
 asks
 the subnodes for an exact count for the final top-N facets,



 What's the point to refine these counts? I've thought that it make sense
 only for facet.limit ed requests. Is it correct statement? can those who
 suffer from the low performance, just unlimit  facet.limit to avoid that
 distributed hop?


 Presumably yes, but if you've got a sufficiently high cardinality field
 then any gains made by missing out the hop will probably be offset by
 having to stream all the return values back again.

 Alan


  --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com






 --
 Charlie Hull
 Flax - Open Source Enterprise Search

 tel/fax: +44 (0)8700 118334
 mobile:  +44 (0)7767 825828
 web: www.flax.co.uk



Re: FAST-like document vector data structures in Solr?

2014-09-05 Thread jim ferenczi
Hi,
Something like ?:
https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
And just to show some impressive search functionality of the wiki: ;)
https://cwiki.apache.org/confluence/dosearchsite.action?where=solrspaceSearch=truequeryString=document+vectors

Cheers,
Jim


2014-09-05 9:44 GMT+02:00 Jürgen Wagner (DVT) juergen.wag...@devoteam.com
:

 Hello all,
   as the migration from FAST to Solr is a relevant topic for several of
 our customers, there is one issue that does not seem to be addressed by
 Lucene/Solr: document vectors FAST-style. These document vectors are
 used to form metrics of similarity, i.e., they may be used as a
 semantic fingerprint of documents to define similarity relations. I
 can think of several ways of approximating a mapping of this mechanism
 to Solr, but there are always drawbacks - mostly performance-wise.

 Has anybody else encountered and possibly approached this challenge so far?

 Is there anything in the roadmap of Solr that has not revealed itself to
 me, addressing this issue?

 Your input is greatly appreciated!

 Cheers,
 --Jürgen




Re: Incorrect group.ngroups value

2014-08-22 Thread jim ferenczi
Hi Bryan,
This is a known limitations of the grouping.
https://wiki.apache.org/solr/FieldCollapsing#RequestParameters

group.ngroups:


*WARNING: If this parameter is set to true on a sharded environment, all
the documents that belong to the same group have to be located in the same
shard, otherwise the count will be incorrect. If you are using SolrCloud
https://wiki.apache.org/solr/SolrCloud, consider using custom hashing*

Cheers,
Jim



2014-08-21 21:44 GMT+02:00 Bryan Bende bbe...@gmail.com:

 Is there any known issue with using group.ngroups in a distributed Solr
 using version 4.8.1 ?

 I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing several
 queries where ngroups will be more than the actual groups returned in the
 response. For example, ngroups will say 5, but then there will be 3 groups
 in the response. It is not happening on all queries, only some.



Re: Bloom filter

2014-07-30 Thread jim ferenczi
Hi Per,
First of all the BloomFilter implementation in Lucene is not exactly a
bloom filter. It uses only one hash function and you cannot set the false
positive ratio beforehand. ElasticSearch has its own bloom filter
implementation (using guava like BloomFilter), you should take a look at
their implementation if you really need this feature.
What is your use-case ? If your index fits in RAM the bloom filter won't
help (and it may have a negative impact if you have a lot of segments). In
fact the only use case where the bloom filter can help is when your term
dictionary does not fit in RAM which is rarely the case.

Regards,
Jim



2014-07-28 16:13 GMT+02:00 Per Steffensen st...@designware.dk:

 Yes I found that one, along with SOLR-3950. Well at least it seems like
 the support is there in Lucene. I will figure out myself how to make it
 work via Solr, the way I need it to work. My use-case is not as specified
 in SOLR-1375, but the solution might be the same. Any input is of course
 still very much appreciated.

 Regards, Per Steffensen


 On 28/07/14 15:42, Lukas Drbal wrote:

 Hi Per,

 link to jira - https://issues.apache.org/jira/browse/SOLR-1375 Unresolved
 ;-)

 L.


 On Mon, Jul 28, 2014 at 1:17 PM, Per Steffensen st...@designware.dk
 wrote:

  Hi

 Where can I find documentation on how to use Bloom filters in Solr (4.4).
 http://wiki.apache.org/solr/BloomIndexComponent seems to be outdated -
 there is no BloomIndexComponent included in 4.4 code.

 Regards, Per Steffensen







Re: Compression vs FieldCache for doc ids retrieval

2014-06-02 Thread jim ferenczi
@William Firstly because I was sure that the ticket (or an equivalent) was
already opened but I just could not find it. Thanks @Manuel. Secondly
because I wanted to start the discussion, I have the feeling that the
compression of the documents, activated by default, can be a killer for
some applications (if the number of shards is big or if you have a lot of
deep paging queries) and I wanted to check if someone noticed the problem
in a benchmark. Let's say that you have 10 shards and you want to return 10
documents per request, in the first stage of the search each shard would
need to decompress 10 blocks of 16k each whereas the second stage would
need to decompress only 10 blocks total. This makes me believe that this
patch should be the default behaviour for any distributed search in Solr (I
mean more than 1 shard).
Maybe it's better to continue the discussion on the ticket created by
Manuel, but still, I think that it could speed up every queries (not only
deep paging queries like in the patch proposed in Manuel's ticket).

Jim



2014-06-01 14:06 GMT+09:00 William Bell billnb...@gmail.com:

 Why not just submit a JIRA issue - and add your patch so that we can all
 benefit?


 On Fri, May 30, 2014 at 5:34 AM, Manuel Le Normand 
 manuel.lenorm...@gmail.com wrote:

  Is the issue SOLR-5478 what you were looking for?
 



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Does CloudSolrServer hit zookeeper for every request?

2014-06-02 Thread Jim . Musil
I’m curious how CloudSolrServer works in practice.

I understand that it gets the active solr nodes from zookeeper, but does it do 
this for every request?

If it does hit zk for every request, that seems to put a lot of pressure on the 
zk ensemble.

If it does NOT hit zk for every request, then how does it detect changes in the 
number of nodes and the status of the nodes?

Thanks!
Jim M.


Status of configName in core.properties

2014-05-30 Thread Jim . Musil
Hi,

I’m attempting to define a core using the new core discovery method described 
here:

http://wiki.apache.org/solr/Core%20Discovery%20(4.4%20and%20beyond)

At the bottom of the page is a parameter named configName that should allow me 
to specify a configuration name to use for a collection. This does not seem to 
be working. I have a configuration uploaded to zookeeper with a name. I want to 
share that configuration between two cores, but it is only linking to the one 
with the same exact name.

This parameter is marked at “Tentative” for 4.6. What is the status?

Thanks!
Jim


Compression vs FieldCache for doc ids retrieval

2014-05-26 Thread jim ferenczi
Dear Solr users,

we migrated our solution from Solr 4.0 to Solr 4.3 and we noticed a
degradation of the search performance. We compared the two versions and
found out that most of the time is spent in the decompression of the
retrievable fields in Solr 4.3. The block compression of the documents is a
great feature for us because it reduces the size of our index but we don’t
have enough resources (I mean cpus) to safely migrate to the new version.
In order to reduce the cost of the decompression we tried a simple patch in
the BinaryResponseWriter; during the first phase of the distributed search
the response writer gets the documents from the index reader to only
extract the doc ids of the top N results. Our patch uses the field cache to
get the doc ids during the first phase and thus replaces a full
decompression of 16k blocks (for a single document) by a simple get in an
array (the field cache or the doc values). Thanks to this patch we are now
able to handle the same number of QPS than before (with Solr 4.0). Of
course the document cache could help as well but but not as much as one
would have though (mainly because we have a lot of deep paging queries).

I am sure that the idea we implemented is not new but I haven’t seen any
Jira about it. Should we create one (I mean does it have a chance to be
included in future release of Solr or does anybody already working on this)
?

Cheers,

Jim


Re: ContributorsGroup add request

2014-05-13 Thread Jim Martin

Shawn-
   Thanks much. Icon ideas have been recorded.
-Jim

On 5/11/2014 10:39 AM, Shawn Heisey wrote:

On 5/10/2014 6:35 PM, Jim Martin wrote:

Please add me the ContributorsGroup; I've got some Solr icons I'd
like to suggest to the community. Perhaps down the road I can contribute
more. I'm the team lead at Overstock.Com for search, and Solr is the
foundation of what we do.

Username: JamesMartin

I went to add you, but someone else has already done so.

It's entirely possible that because of the Apache email outage, they
have already replied, but the message hasn't made it through to the list
yet.  I'm adding you as a CC here (which I normally don't do) so that
you'll get notified faster.

Thanks,
Shawn






ContributorsGroup add request

2014-05-10 Thread Jim Martin

Greetings-

   Please add me the ContributorsGroup; I've got some Solr icons I'd 
like to suggest to the community. Perhaps down the road I can contribute 
more. I'm the team lead at Overstock.Com for search, and Solr is the 
foundation of what we do.


   Username: JamesMartin

Thanks,
-Jim


Re: Memory + WeakIdentityMap

2014-03-21 Thread jim ferenczi
Hi,
If you are not on windows, you can try to disable the tracking of clones in
the MMapDirectory by setting unmap to false in your solrconfig.xml:


*directoryFactory name=DirectoryFactory
class=solr.MMapDirectoryFactory} *

*  bool name=unmapfalse/bool*
*/directoryFactory*
The MMapDirectory keeps track of all clones in a weak map and forces the
unmapping of the buffers on close. This was added because on Windows
mmapped files cannot be modified or deleted. If unmap is false the weak map
is not created and the weak references you see in your heap should disapear
as well.
You can find more informations here:
https://issues.apache.org/jira/browse/LUCENE-4740

Thanks,
Jim






2014-03-21 6:56 GMT+01:00 Shawn Heisey s...@elyograg.org:

 On 3/20/2014 6:54 PM, Harish Agarwal wrote:
  I'm transitioning my index from a 3.x version to 4.6.  I'm running a
 large
  heap (20G), primarily to accomodate a large facet cache (~5G), but have
  been able to run it on 3.x stably.
 
  On 4.6.0 after stress testing I'm finding that all of my shards are
  spending all of their time in GC.  After taking a heap dump and
 analyzing,
  it appears that org.apache.lucene.util.WeakIdentityMap is using many Gs
 of
  memory.  Does anyone have any insight into which Solr component(s) use
 this
  and whether this kind of memory consumption is to be expected?

 I can't really say what WeakIdentityMap is doing.  I can trace the only
 usage in Lucene to MMapDirectory, but it doesn't make a lot of sense for
 this to use a lot of memory, unless this is the source of the memory
 misreporting that Java 7 seems to do with MMap.  See this message in a
 recent thread on this mailing list:


 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c53285ca1.9000...@elyograg.org%3E

 If you have a lot of facets, one approach for performance is to use
 facet.method=enum so that your Java heap does not need to be super large.

 This does not actually reduce the overall system memory requirements.
 It just shifts the responsibility for caching to the operating system
 instead of Solr, and requires that you have enough memory to put a
 majority of the index into the OS disk cache.  Ideally, there would be
 enough RAM for the entire index to fit.

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Another option for facet memory optimization is docValues.  One caveat:
 It is my understanding that the docValues content is the same as a
 stored field.  Depending on your schema definition, this may be
 different than the indexed values that facets normally use.  The
 docValues feature also helps with sorting.

 Thanks,
 Shawn




RE: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Beale, Jim (US-KOP)
Hi David,

I finally got back to this again, after getting sidetracked for a couple of 
weeks.

I implemented things in accordance with my understanding of what you wrote 
below.  Using SolrJ, the code to index the spatial field is as follows,

private void addSpatialField(double lat, double lon, SolrInputDocument 
document) {
   StringBuilder sb = new StringBuilder();
   sb.append(lat).append(,).append(lon);
   document.addField(location, sb.toString());
}

Using Solr 4.3.1 and spatial4j 0.3, I am getting the following error in the 
solr logs:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)

spatial4j 0.3 is looking for something like POINT but Solr is converting my 
lat,long to Pt(x=-72.544123,y=41.85).

Version mismatch?

Thanks in advance for your help!

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Monday, January 13, 2014 11:30 AM
To: Beale, Jim (US-KOP); solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

Hello Jim,

By the way, using GeohashPrefixTree.getMaxLevelsPossible() is usually an 
extreme choice.  Instead you probably want to choose only as many levels needed 
for your distance tolerance.  See SpatialPrefixTreeFactory which you can use 
outright or borrow the code it uses.

Looking at your code, I see you are coding against Solr directly in Java 
instead of where most people do this as an HTTP web service.  But it's unclear 
in what context your code runs because you are getting into the guts of things 
that you normally don't have to do, even if your using SolrJ or writing some 
sort of UpdateRequestProcessor.  Simply configure the field type in schema.xml 
appropriately, and then to index a point simply give Solr a string for the 
field in latitude, longitude format.  I don't know why you are using 
field.tokenStream(analyzer) for the field value - that is clearly wrong and the 
cause of the error.  I think your confusion more has to do with differences in 
coding to Lucene versus Solr;  this being an actual spatial concern.  You 
referenced SpatialDemoUpdateProcessorFactory so I see you have looked at 
SolrSpatialSandbox on GitHub.  That particular URP should get some warnings 
added to it in the code to suggest that you probably should do what it does.  
If you look at the solrconfig.xml that configures it, there is a warning as 
follows:
  !-- spatial
   Only needed for an SpatialDemoUpdateProcessorFactory which copies  spatial 
objects from one field
to other spatial fields in object form to avoid redundant/inefficient 
string to spatial object de-serialization.
   --
Even if you have a similar circumstance, you're code doesn't quite look like 
this URP.  You shouldn't need to reference the SpatialStrategy, for example.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Friday, January 10, 2014 at 12:15 PM
To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Cc: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org
Subject: Indexing spatial fields into SolrCloud (HTTP)

I am porting an application from Lucene to Solr which makes use of spatial4j 
for distance searches.  The Lucene version works correctly but I am having a 
problem getting the Solr version to work in the same way.

Lucene version:

SpatialContext geoSpatialCtx = SpatialContext.GEO;

   geoSpatialStrategy = new RecursivePrefixTreeStrategy(new 
GeohashPrefixTree(
 geoSpatialCtx, GeohashPrefixTree.getMaxLevelsPossible()), 
DocumentFieldNames.LOCATION);


   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  document.add(field);
   }

   //Store the field
   document.add(new StoredField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point)));

Solr version:

   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  try {
 solrDocument.addField(field.name(), 
field.tokenStream(analyzer));
  } catch (IOException e) {
 LOGGER.error(Failed to add geo field to Solr index, e);
  }
   }

   // Store the field
   solrDocument.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));

The server-side error is as follows:

Caused by: com.spatial4j.core.exception.InvalidShapeException: Unable to read

RE: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Beale, Jim (US-KOP)
Hi David,

You wrote:

 Perhaps you’ve got some funky UpdateRequestProcessor from experimentation 
 you’ve done that’s parsing then toString’ing it?

No, nothing at all.  The update processing is straight out-of-the-box Solr.

 And also, your stack trace should have more to it than what you present here.

I trimmed the stack trace because it seemed like TMI, but here it is for 
completeness’ sake:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)
at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:186)
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:257)
at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:199)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:504)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:640)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
…..

 Your SolrJ code looks totally fine.

Finally, I changed the code to the following,

private void addSpatialLcnFields(double lat, double lon, SolrInputDocument 
document) {
   Point point = geoSpatialCtx.makePoint(lon, lat);
   document.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));
}

and at least it isn’t throwing exceptions now.  I’m not sure what is going into 
the index yet.  I’ll have to wait for it to finish.  (

Does that code seem correct?  I want to avoid the deprecated API but so far I 
haven’t found any alternatives which work.

Thanks,

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, February 12, 2014 3:07 PM
To: Beale, Jim (US-KOP); solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

That’s pretty weird.  It appears that somehow a Spatial4j Point class is having 
it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) ) 
and then Spatial4j is trying to parse this which isn’t in a valid format — the 
toString is more debug-ability.  Your SolrJ code looks totally fine.  Perhaps 
you’ve got some funky UpdateRequestProcessor from experimentation you’ve done 
that’s parsing then toString’ing it?  And also, your stack trace should have 
more to it than what you present here.  It may be on the server-side versus the 
HTTP error page which can get abbreviated.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Wednesday, February 12, 2014 at 2:05 PM
To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org, 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: RE: Indexing spatial fields into SolrCloud (HTTP)

Hi David,

I finally got back to this again, after getting sidetracked for a couple of 
weeks.

I implemented things in accordance with my understanding of what you wrote 
below.  Using SolrJ, the code to index the spatial field is as follows,

privatevoid addSpatialField(double lat, double lon

RE: Indexing spatial fields into SolrCloud (HTTP)

2014-01-27 Thread Beale, Jim (US-KOP)
Hi David,

It's taken me ages to get back to this.

Thanks for your help!  I guess I overly complicated the indexing of spatial 
fields in Solr.  I should have known that Solr would just do the right thing! :)

Thanks,
Jim

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Monday, January 13, 2014 11:30 AM
To: Beale, Jim (US-KOP); solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

Hello Jim,

By the way, using GeohashPrefixTree.getMaxLevelsPossible() is usually an 
extreme choice.  Instead you probably want to choose only as many levels needed 
for your distance tolerance.  See SpatialPrefixTreeFactory which you can use 
outright or borrow the code it uses.

Looking at your code, I see you are coding against Solr directly in Java 
instead of where most people do this as an HTTP web service.  But it's unclear 
in what context your code runs because you are getting into the guts of things 
that you normally don't have to do, even if your using SolrJ or writing some 
sort of UpdateRequestProcessor.  Simply configure the field type in schema.xml 
appropriately, and then to index a point simply give Solr a string for the 
field in latitude, longitude format.  I don't know why you are using 
field.tokenStream(analyzer) for the field value - that is clearly wrong and the 
cause of the error.  I think your confusion more has to do with differences in 
coding to Lucene versus Solr;  this being an actual spatial concern.  You 
referenced SpatialDemoUpdateProcessorFactory so I see you have looked at 
SolrSpatialSandbox on GitHub.  That particular URP should get some warnings 
added to it in the code to suggest that you probably should do what it does.  
If you look at the solrconfig.xml that configures it, there is a warning as 
follows:
  !-- spatial
   Only needed for an SpatialDemoUpdateProcessorFactory which copies  spatial 
objects from one field
to other spatial fields in object form to avoid redundant/inefficient 
string to spatial object de-serialization.
   --
Even if you have a similar circumstance, you're code doesn't quite look like 
this URP.  You shouldn't need to reference the SpatialStrategy, for example.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Friday, January 10, 2014 at 12:15 PM
To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Cc: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org
Subject: Indexing spatial fields into SolrCloud (HTTP)

I am porting an application from Lucene to Solr which makes use of spatial4j 
for distance searches.  The Lucene version works correctly but I am having a 
problem getting the Solr version to work in the same way.

Lucene version:

SpatialContext geoSpatialCtx = SpatialContext.GEO;

   geoSpatialStrategy = new RecursivePrefixTreeStrategy(new 
GeohashPrefixTree(
 geoSpatialCtx, GeohashPrefixTree.getMaxLevelsPossible()), 
DocumentFieldNames.LOCATION);


   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  document.add(field);
   }

   //Store the field
   document.add(new StoredField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point)));

Solr version:

   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  try {
 solrDocument.addField(field.name(), 
field.tokenStream(analyzer));
  } catch (IOException e) {
 LOGGER.error(Failed to add geo field to Solr index, e);
  }
   }

   // Store the field
   solrDocument.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));

The server-side error is as follows:

Caused by: com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
org.apache.lucene.spatial.prefix.PrefixTreeStrategy$CellTokenStr\
eam@0
at 
com.spatial4j.core.io.ShapeReadWriter.readShape(ShapeReadWriter.java:48)
at 
com.spatial4j.core.context.SpatialContext.readShape(SpatialContext.java:195)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:142)

I've seen David Smiley's sample code, specifically the class, 
SpatialDemoUpdateProcessorFactory, but I can't say that I was able to benefit 
from it at all.

What I'm trying to do seems like it should be easy - just to index a point for 
distance searching - but I'm obviously missing something.

Any ideas?
Thanks,
Jim

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute

Indexing spatial fields into SolrCloud (HTTP)

2014-01-10 Thread Beale, Jim (US-KOP)
I am porting an application from Lucene to Solr which makes use of spatial4j 
for distance searches.  The Lucene version works correctly but I am having a 
problem getting the Solr version to work in the same way.

Lucene version:

SpatialContext geoSpatialCtx = SpatialContext.GEO;

   geoSpatialStrategy = new RecursivePrefixTreeStrategy(new 
GeohashPrefixTree(
 geoSpatialCtx, GeohashPrefixTree.getMaxLevelsPossible()), 
DocumentFieldNames.LOCATION);


   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  document.add(field);
   }

   //Store the field
   document.add(new StoredField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point)));

Solr version:

   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  try {
 solrDocument.addField(field.name(), 
field.tokenStream(analyzer));
  } catch (IOException e) {
 LOGGER.error(Failed to add geo field to Solr index, e);
  }
   }

   // Store the field
   solrDocument.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));

The server-side error is as follows:

Caused by: com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
org.apache.lucene.spatial.prefix.PrefixTreeStrategy$CellTokenStr\
eam@0
at 
com.spatial4j.core.io.ShapeReadWriter.readShape(ShapeReadWriter.java:48)
at 
com.spatial4j.core.context.SpatialContext.readShape(SpatialContext.java:195)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:142)

I've seen David Smiley's sample code, specifically the class, 
SpatialDemoUpdateProcessorFactory, but I can't say that I was able to benefit 
from it at all.

What I'm trying to do seems like it should be easy - just to index a point for 
distance searching - but I'm obviously missing something.

Any ideas?
Thanks,
Jim

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.


Re: Prioritize search returns by URL path?

2013-12-12 Thread Jim Glynn
Thanks Chris. I think you've hit the nail on the head.

I understand your concern about prioritizing content simply by content type,
and generally I'd agree with you. However, our situation is a bit unusual.
We don't use our Wiki feature as true wikis. We publish only authoritative
content to them, and to our blogs, so those really are the things we want
returned first. The wikis most often contain the information we want our
customers to find.

Thanks again for the syntax help. We'll give it a try.

JRG



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4106481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Prioritize search returns by URL path?

2013-12-06 Thread Jim Glynn
Thanks all. Yes, we can differentiate between content types by URL.
Everything else being equal, Wiki posts should always be returned higher
than blog posts, and blog posts should always be returned higher than forum
posts.

Within forum posts, we want to rank Verified answered and Suggested answered
posts higher than unanswered posts. These cannot be identified via path -
only via metadata attached to the individual post. Any suggestions?

@Alex, I'll investigate the references you provided. Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105426.html
Sent from the Solr - User mailing list archive at Nabble.com.


Prioritize search returns by URL path?

2013-12-04 Thread Jim Glynn
We have a Telligent based community with Solr as the search engine. We want
to prioritize search returns from within the community by the type of
content: Wiki articles as most relevant, then blog posts, then Verified
answer and Suggested answer forum posts, then remaining forum posts. We have
also implemented a Helpful voting capability and would like to boost items
with more Helpful votes above those within their same category with fewer
votes.

Has anyone out there done something similar, or can someone suggest how to
do this? We're new to search engine tuning, so assume very little knowledge
on our part.

Thanks for your help!
JRG



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrCloud question

2013-11-18 Thread Beale, Jim (US-KOP)
Thanks Michael,

I am having a terrible time getting this non-sharded index up.  Everything I 
try leads to a dead-end.

http://10.0.15.44:8511/solr/admin/collections?action=CREATEname=tpnumShards=1replicationFactor=5

it uses the solrconfig.xml from another core.  That solrconfig.xml is deployed 
in conjunction with a solrcore.properties and the replication handler is 
configured with properties from that core's solrcore.properties file.  The 
CREATE action uses the solrconfig.xml but not the properties so it fails.

I tried to upload a different solrconfig.xml to zookeeper using the zkcli 
script -cmd upconfig and then to specify that config in the creation of the TP 
core like so

http://10.0.15.44:8511/solr/admin/collections?action=CREATEname=tpnumShards=1replicationFactor=5collection.configName=solrconfigTP.xml

However, how can replication masters and slaves be configured with a single 
solrconfig.xml file unless each node is allowed to have its own config?

This is a royal PITA. I may be wrong, but I think it is broken.  Without a way 
to specify numShards per core in solr.xml, it seems impossible to have one 
sharded core and one non-sharded core.

To be honest, I don't even care about replication.  Why can't I specify a core 
that is non-sharded, non-replicated and have the exact same core on all five of 
my boxes?



Thanks,
Jim


-Original Message-
From: michael.boom [mailto:my_sky...@yahoo.com]
Sent: Monday, November 18, 2013 7:14 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud question

Hi,

The CollectionAPI provides some more options that will prove to be very
usefull to you:
/admin/collections?action=CREATEname=namenumShards=numberreplicationFactor=numbermaxShardsPerNode=numbercreateNodeSet=nodelistcollection.configName=configname

Have a look at:
https://cwiki.apache.org/confluence/display/solr/Collections+API

Regarding your observations:
1. Completely normal, that's standard naming
2. When you created the collection you did not specify a configuration so
the new collection will use the conf already stored in ZK. If you have more
than one not sure which one will be picked as default.
3. You should be able to create replicas, by adding new cores on the other
machines, and specifying the collection name and shard id. The data will
then be replicated automatically to the new node. If you already tried that
and get errors/problems while doing it provide some more details.

As far as i know you should be able to move/replace the index data, as long
as the source collection has the same config as the target collection.
Afterwards you'll have to reload your core / restart the Solr instance - not
sure which one will do it - most likely the latter.
But it will be easier if you use the method described at point 3 above.
Please someone correct me, if i'm wrong.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-question-tp4101266p4101675.html
Sent from the Solr - User mailing list archive at Nabble.com.
The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.


RE: SolrCloud question

2013-11-18 Thread Beale, Jim (US-KOP)
I shouldn't be configuring the replication handler?  I didn't know that!

The documentation describes how to do it, e.g., for Solr 4.6

https://cwiki.apache.org/confluence/display/solr/Index+Replication

Now I'm evenmore confused than ever.  If a replication handler isn't defined, 
then I get replication handler isn't defined errors in the logs, and the 
added core fails to do anything.

It seems like such a simple task: create 1 sharded and 1 unsharded core.

But nothing I've tried so far works.  Why can't numShards be a property of the 
core???

Thanks,
Jim



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Monday, November 18, 2013 5:00 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud question

You shouldn't be configuring the replication handler if you are using solrcloud.

- Mark

On Nov 18, 2013, at 3:51 PM, Beale, Jim (US-KOP) jim.be...@hibu.com wrote:

 Thanks Michael,

 I am having a terrible time getting this non-sharded index up.  Everything I 
 try leads to a dead-end.

 http://10.0.15.44:8511/solr/admin/collections?action=CREATEname=tpnumShards=1replicationFactor=5

 it uses the solrconfig.xml from another core.  That solrconfig.xml is 
 deployed in conjunction with a solrcore.properties and the replication 
 handler is configured with properties from that core's solrcore.properties 
 file.  The CREATE action uses the solrconfig.xml but not the properties so it 
 fails.

 I tried to upload a different solrconfig.xml to zookeeper using the zkcli 
 script -cmd upconfig and then to specify that config in the creation of the 
 TP core like so

 http://10.0.15.44:8511/solr/admin/collections?action=CREATEname=tpnumShards=1replicationFactor=5collection.configName=solrconfigTP.xml

 However, how can replication masters and slaves be configured with a single 
 solrconfig.xml file unless each node is allowed to have its own config?

 This is a royal PITA. I may be wrong, but I think it is broken.  Without a 
 way to specify numShards per core in solr.xml, it seems impossible to have 
 one sharded core and one non-sharded core.

 To be honest, I don't even care about replication.  Why can't I specify a 
 core that is non-sharded, non-replicated and have the exact same core on all 
 five of my boxes?



 Thanks,
 Jim


 -Original Message-
 From: michael.boom [mailto:my_sky...@yahoo.com]
 Sent: Monday, November 18, 2013 7:14 AM
 To: solr-user@lucene.apache.org
 Subject: RE: SolrCloud question

 Hi,

 The CollectionAPI provides some more options that will prove to be very
 usefull to you:
 /admin/collections?action=CREATEname=namenumShards=numberreplicationFactor=numbermaxShardsPerNode=numbercreateNodeSet=nodelistcollection.configName=configname

 Have a look at:
 https://cwiki.apache.org/confluence/display/solr/Collections+API

 Regarding your observations:
 1. Completely normal, that's standard naming
 2. When you created the collection you did not specify a configuration so
 the new collection will use the conf already stored in ZK. If you have more
 than one not sure which one will be picked as default.
 3. You should be able to create replicas, by adding new cores on the other
 machines, and specifying the collection name and shard id. The data will
 then be replicated automatically to the new node. If you already tried that
 and get errors/problems while doing it provide some more details.

 As far as i know you should be able to move/replace the index data, as long
 as the source collection has the same config as the target collection.
 Afterwards you'll have to reload your core / restart the Solr instance - not
 sure which one will do it - most likely the latter.
 But it will be easier if you use the method described at point 3 above.
 Please someone correct me, if i'm wrong.



 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-question-tp4101266p4101675.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 The information contained in this email message, including any attachments, 
 is intended solely for use by the individual or entity named above and may be 
 confidential. If the reader of this message is not the intended recipient, 
 you are hereby notified that you must not read, use, disclose, distribute or 
 copy any part of this communication. If you have received this communication 
 in error, please immediately notify me by email and destroy the original 
 message, including any attachments. Thank you.

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message

SolrCloud question

2013-11-15 Thread Beale, Jim (US-KOP)
Hello all,

I am trying to set up a SolrCloud deployment consisting of 5 boxes each of 
which is running Solr under jetty.  A zookeeper ensemble is running separately 
on 3 of the boxes.

Each Solr instance has 2 cores, one of which is sharded across the five boxes 
and the other not sharded at all because it is a much smaller index.  numShards 
is set to 5 in the command to start jetty, -DnumShards=5.

It turns out that getting this configuration to work is not as easy as I had 
hoped.  According to JIRA SOLR-3186, If you are bootstrapping a multi-core 
setup, you currently have to settle for the same
numShards for every core.  Unfortunately that JIRA was closed without any 
implementation.

Is this limitation still in effect?  Does the new core discovery mode offer 
anything in this regard?

Is there any way at all to deploy two cores with different numShards?

How hard would it be to implement this?  Is it compatible with the architecture 
of Solr 5?

Thanks,
Jim Beale


The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.


RE: SolrCloud question

2013-11-15 Thread Beale, Jim (US-KOP)
Hi Mark,

Thanks for the reply.

I am struggling a bit here. Sorry if these are basic questions!  I can't find 
the answers anywhere.

I modified my solr.xml on all boxes to comment out the core definition for 'tp'.
Then, I used /admin/collections?action=CREATEname=tpnumShards=1 against one 
of the boxes.  That created 'shard1' for the tp index.

(1) It named the dir 'tp_shard1_replica1'
(2) The core seems to be using the same config as the bn core
(3) I am unable to create a similar core on the other boxes.

When I use replicationFactor=5, it creates replicas of the index on the other 
boxes.

Can I then copy a pre-existing LCN index into the data/index directory and have 
it replicate to the other boxes?

Thanks!

Jim



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Friday, November 15, 2013 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud question

We are moving away from pre defining SolrCores for SolrCloud. The correct 
approach would be to use thew Collections API - then it is quite simple to 
change the number of shards for each collection you create.

Hopefully our examples will move to doing this before long.

- Mark

On Nov 15, 2013, at 11:47 AM, Beale, Jim (US-KOP) jim.be...@hibu.com wrote:

 Hello all,

 I am trying to set up a SolrCloud deployment consisting of 5 boxes each of 
 which is running Solr under jetty.  A zookeeper ensemble is running 
 separately on 3 of the boxes.

 Each Solr instance has 2 cores, one of which is sharded across the five boxes 
 and the other not sharded at all because it is a much smaller index.  
 numShards is set to 5 in the command to start jetty, -DnumShards=5.

 It turns out that getting this configuration to work is not as easy as I had 
 hoped.  According to JIRA SOLR-3186, If you are bootstrapping a multi-core 
 setup, you currently have to settle for the same
 numShards for every core.  Unfortunately that JIRA was closed without any 
 implementation.

 Is this limitation still in effect?  Does the new core discovery mode offer 
 anything in this regard?

 Is there any way at all to deploy two cores with different numShards?

 How hard would it be to implement this?  Is it compatible with the 
 architecture of Solr 5?

 Thanks,
 Jim Beale


 The information contained in this email message, including any attachments, 
 is intended solely for use by the individual or entity named above and may be 
 confidential. If the reader of this message is not the intended recipient, 
 you are hereby notified that you must not read, use, disclose, distribute or 
 copy any part of this communication. If you have received this communication 
 in error, please immediately notify me by email and destroy the original 
 message, including any attachments. Thank you.

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.


Re: run filter queries after post filter

2013-10-09 Thread jim ferenczi
Hi Rohit,
The main problem is that if the query inside the filter does not have a
PostFilter implementation then your post filter is silently transformed
into a simple filter. The query field:value is based on the inverted
lists and does not have a postfilter support.
If your field is a numeric field take a look at the frange query parser
which has post filter support:
To filter out document with a field value less than 5:
fq={!frange l=5 cache=false cost=200}field(myField)

Cheers,
Jim


2013/10/9 Rohit Harchandani rhar...@gmail.com

 yes i get that. actually i should have explained in more detail.

 - i have a query which gets certain documents.
 - the post filter gets these matched documents and does some processing on
 them and filters the results.
 - but after this is done i need to apply another filter - which is why i
 gave a higher cost to it.

 the reason i need to do this is because the processing done by the post
 filter depends on the documents matching the query till that point.
 since the normal fq clause is also getting executed before the post filter
 (despite the cost), the final results are not accurate

 thanks
 Rohit




 On Wed, Oct 9, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Ah, I think you're misunderstanding the nature of post-filters.
  Or I'm confused, which happens a lot!
 
  The whole point of post filters is that they're assumed to be
  expensive (think ACL calculation). So you want them to run
  on the fewest documents possible. So only docs that make it
  through the primary query _and_ all lower-cost filters will get
  to this post-filter. This means they can't be cached for
  instance, because they don't see (hopefully) very many docs.
 
  This is radically different than normal fq clauses, which are
  calculated on the entire corpus and can thus be cached.
 
  Best,
  Erick
 
  On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com
  wrote:
   Hey,
   so the post filter logs the number of ids that it receives.
   With the above filter having cost=200, the post filter should have
  received
   the same number of ids as before ( when the filter was not present ).
   But that does not seem to be the case...with the filter query on the
  index,
   the number of ids that the post filter is receiving reduces.
  
   Thanks,
   Rohit
  
  
   On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
  
   Hmmm, seems like it should. What's our evidence that it isn't working?
  
   Best,
   Erick
  
   On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com
   wrote:
Hey,
I am using solr 4.0 with my own PostFilter implementation which is
   executed
after the normal solr query is done. This filter has a cost of 100.
  Is it
possible to run filter queries on the index after the execution of
 the
   post
filter?
I tried adding the below line to the url but it did not seem to
 work:
fq={!cache=false cost=200}field:value
Thanks,
Rohit
  
 



RE: Indexing into SolrCloud

2013-07-19 Thread Beale, Jim (US-KOP)
Hi Erick!

Thanks for the reply.  When I call server.add() it is just to add a single 
document.

But, still, I think you might be correct about the size of the ultimate 
request.  I decided to grab the bull by the horns by instantiating my own 
HttpClient and, in so doing, my first run changed the following parameters,

SOLR_HTTP_THREAD_COUNT=4
SOLR_MAX_BUFFERED_DOCS=1
SOLR_MAX_CONNECTIONS=256
SOLR_MAX_CONNECTIONS_PER_HOST=128
SOLR_CONNECTION_TIMEOUT=0
SOLR_SO_TIMEOUT=0

I doubled the number of emptying threads, reduced the size of the request 
buffer 5x, increased the connection limits and set the timeouts to infinite.  
(I'm not actually sure what the defaults for the timeouts were since I didn't 
see them in the Solr code and didn't track it down.)

Anyway, the good news is that this combination of parameters worked.  The bad 
news is that I don't know whether it was resolved by changing one or more of 
the parameters.

But, regardless, I think the whole experiment verifies your thinking that the 
request was too big!

Thanks again!! :)


Jim Beale
Lead Developer
hibu.com
2201 Renaissance Boulevard, King of Prussia, PA, 19406
Office: 610-879-3864
Mobile: 610-220-3067




-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, July 19, 2013 8:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing into SolrCloud

Usually EOF errors indicate that the packet you're sending are too big.

Wait, though. 50K is not buffered docs, I think it's buffered _requests_.
So you're creating a queue that's ginormous and asking 2 threads to empty it.

But that's not really the issue I suspect. How many documents are you adding
at a time when you call server.add? I.e. are you using sever.add(doc) or
server.add(doclist)? If the latter and you're adding a bunch of docs, try
lowering that number. If you're sending one doc at a time I'm on the
wrong track.

Best
Erick

On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) jim.be...@hibu.com wrote:
 Hey folks,

 I've been migrating an application which indexes about 15M documents from 
 straight-up Lucene into SolrCloud.  We've set up 5 Solr instances with a 3 
 zookeeper ensemble using HAProxy for load balancing. The documents are 
 processed on a quad core machine with 6 threads and indexed into SolrCloud 
 through HAProxy using ConcurrentUpdateSolrServer in order to batch the 
 updates.  The indexing box is heavily-loaded during indexing but I don't 
 think it is so bad that it would cause issues.

 I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 
 1.4.22.

 I've been accepting the default HttpClient with 50K buffered docs and 2 
 threads, i.e.,

 int solrMaxBufferedDocs = 5;
 int solrThreadCount = 2;
 solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, 
 solrMaxBufferedDocs, solrThreadCount);

 autoCommit is configured in the solrconfig as follows:

  autoCommit
maxTime60/maxTime
maxDocs50/maxDocs
openSearcherfalse/openSearcher
  /autoCommit

 I'm getting the following errors on the client and server sides respectively:

 Client side:

 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
 SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught 
 when processing request: Software caused connection abort: socket write error
 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
 SystemDefaultHttpClient - Retrying request
 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
 SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught 
 when processing request: Software caused connection abort: socket write error
 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
 SystemDefaultHttpClient - Retrying request

 Server side:

 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore  â 
 java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] 
 early EOF
 at 
 com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
 at 
 com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
 at 
 com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
 at 
 com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at 
 org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)

 When I disabled autoCommit on the server side, I didn't see any errors there 
 but I still get the issue client-side after about 2 million documents - which 
 is about 45 minutes.

 Has anyone seen this issue before?  I couldn't find anything useful on the 
 usual places.

 I suppose I could setup wireshark to see what is happening but I'm hoping 
 that someone has a better suggestion.

 Thanks in advance for any help!


 Best regards,
 Jim Beale

 hibu.com
 2201 Renaissance Boulevard, King of Prussia, PA, 19406
 Office: 610-879-3864
 Mobile: 610

Indexing into SolrCloud

2013-07-18 Thread Beale, Jim (US-KOP)
Hey folks,

I've been migrating an application which indexes about 15M documents from 
straight-up Lucene into SolrCloud.  We've set up 5 Solr instances with a 3 
zookeeper ensemble using HAProxy for load balancing. The documents are 
processed on a quad core machine with 6 threads and indexed into SolrCloud 
through HAProxy using ConcurrentUpdateSolrServer in order to batch the updates. 
 The indexing box is heavily-loaded during indexing but I don't think it is so 
bad that it would cause issues.

I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 
1.4.22.

I've been accepting the default HttpClient with 50K buffered docs and 2 
threads, i.e.,

int solrMaxBufferedDocs = 5;
int solrThreadCount = 2;
solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, 
solrMaxBufferedDocs, solrThreadCount);

autoCommit is configured in the solrconfig as follows:

 autoCommit
   maxTime60/maxTime
   maxDocs50/maxDocs
   openSearcherfalse/openSearcher
 /autoCommit

I'm getting the following errors on the client and server sides respectively:

Client side:

2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when 
processing request: Software caused connection abort: socket write error
2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
SystemDefaultHttpClient - Retrying request
2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when 
processing request: Software caused connection abort: socket write error
2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
SystemDefaultHttpClient - Retrying request

Server side:

7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore  â 
java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early 
EOF
at 
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)

When I disabled autoCommit on the server side, I didn't see any errors there 
but I still get the issue client-side after about 2 million documents - which 
is about 45 minutes.

Has anyone seen this issue before?  I couldn't find anything useful on the 
usual places.

I suppose I could setup wireshark to see what is happening but I'm hoping that 
someone has a better suggestion.

Thanks in advance for any help!


Best regards,
Jim Beale

hibu.com
2201 Renaissance Boulevard, King of Prussia, PA, 19406
Office: 610-879-3864
Mobile: 610-220-3067

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.


SEVERE:IOException occured when talking to server

2013-05-15 Thread Gesty, Jim
We have a simple SolrCloud setup (4.1) running with a single shard and
multiple replicas across 3 servers, and it's working fine except once in a 
while,
the leader logs this error. We fine-tuned GC among other things and everything 
is lightning fast. However, we still receive this SEVERE error a few times a 
day.


15-May-2013 08:38:36.701 SEVERE [tomcat-http--6] 
org.apache.solr.common.SolrException.log shard update error StdNode: 
http://10.10.4.118:11280/solr/core1/:org.apache.solr.client.solrj.SolrServerException:
 IOException occured when talking to server at: 
http://10.10.4.118:11280/solr/core1
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:416)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.http.NoHttpResponseException: The target server failed to 
respond
at 
org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
at 
org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
... 11 more

15-May-2013 08:38:36.706 INFO [tomcat-http--6] 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish try and 
ask http://10.10.4.118:11280/solr to recover
15-May-2013 08:38:36.706 INFO [tomcat-http--6] 
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient Creating new http 
client, config:maxConnections=128maxConnectionsPerHost

Any ideas on what the root cause of this issue may be and how we can remediate 
it?

J



NOTICE: This e-mail and any attachments is intended only for use by the 
addressee(s) named herein and may contain legally privileged, proprietary or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
email, and any attachments thereto, is strictly prohibited. If you receive this 
email in error please immediately notify me via reply email or at (800) 
927-9800 and permanently delete the original copy and any copy of any e-mail, 
and any printout.


RE: Highlighter showing matched query words only

2011-11-08 Thread Jim Cha
Erick,

Thank you for your response to my concerns! After reading some documentations, 
I come up with the following solution. It is not doing exactly what I would 
like it to do, but close.

Basically I set hl.snippets to be a large int, e.g. 50, and hl.fragsize a small 
positive int, e.g. 1. The parameter hl.snippets defines the maximum number of 
highlight snippets returned, and hl.fragsize defines the number of characters 
in each returned snippet. By setting hl.snippets=50hl.fragsize=1, I can get a 
list of highlight snippets. Each snippet will include mainly the matched query 
words with a couple other words before or after the matched words. At least, 
the regex will have an easier job to do.

It is simply a workaround before a formal solution can be found. I will post 
more information after I dig deeper in the issue.

Jim



From: Erick Erickson [via Lucene] [ml-node+s472066n348276...@n3.nabble.com]
Sent: Saturday, November 05, 2011 8:56 AM
To: Jian Ma
Subject: Re: Highlighter showing matched query words only

Not that I know of. The regex shouldn't be all that expensive, do you have
proof that this is a performance issue? If you don't, I'd just do the simple
thing first...

And probably just searching for em would be better than REs

Best
Erick

On Thu, Nov 3, 2011 at 7:04 PM, Nikeman [hidden email]UrlBlockedError.aspx 
wrote:

 Hello Folks,

 I am a newbie of Solr. I wonder if Solr Highlighter can show the matched
 query words only. Suppose my query is godfather AND pacino. I just want to
 display godfather and pacino in any of the highlighted fields. For the
 sake of performance, I do not want to use regular expressions to parse the
 text and locate the query words which are already enclosed between em and
 /em. Solr obviously has already done the searching and highlighting, but
 the Solr output mixes what I want with what I do not want.

 I just want to get out the intermediate results, the matching query words,
 and nothing else.

 Is there a way to get the intermediate results, the matching query words,
 before they are mixed with other text? Thank you all very much for your help
 in advance!

 N. J.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Highlighter-showing-matched-query-words-only-tp3478731p3478731.html
 Sent from the Solr - User mailing list archive at Nabble.com.




If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Highlighter-showing-matched-query-words-only-tp3478731p3482766.html
To unsubscribe from Highlighter showing matched query words only, click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3478731code=amlhbi5tYUBodWF3ZWkuY29tfDM0Nzg3MzF8LTI5MjQ4NzU5Mg==.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighter-showing-matched-query-words-only-tp3478731p3491212.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: HTTP Status 500 - Severe errors in solr configuration change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null ------------------------------------------------------------- org

2011-04-01 Thread jim
 Many thanks. the problem was solved with your help~~~

--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Status-500-Severe-errors-in-solr-configuration-change-abortOnConfigurationError-false-abortOnCo6-tp2757494p2762692.html
Sent from the Solr - User mailing list archive at Nabble.com.


HTTP Status 500 - Severe errors in solr configuration change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null ------------------------------------------------------------- org.xml

2011-03-31 Thread jim
hi all,
I used ubuntu 10.10 ,I'm trying to get solr 1.4up andrunning,with no
success. i have fllowed this
http://ubuntuforums.org/showthread.php?t=1532230 to run my solr,
but there has  error as:

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change: false in null
-
org.xml.sax.SAXParseException; lineNumber: 1036; columnNumber: 2; The markup
in the document following the root element must be well-formed. at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:253)
at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:288)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at
org.apache.solr.core.Config.(Config.java:110) at
org.apache.solr.core.SolrConfig.(SolrConfig.java:130) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4001)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4651)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
org.apache.catalina.core.StandardHost.start(StandardHost.java:785) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:445) at
org.apache.catalina.core.StandardService.start(StandardService.java:519) at
org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
org.apache.catalina.startup.Catalina.start(Catalina.java:581) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 


any body know how to do?thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Status-500-Severe-errors-in-solr-configuration-change-abortOnConfigurationError-false-abortOnCo6-tp2757493p2757493.html
Sent from the Solr - User mailing list archive at Nabble.com.


HTTP Status 500 - Severe errors in solr configuration change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null ------------------------------------------------------------- org.xml

2011-03-31 Thread jim
hi all,
I used ubuntu 10.10 ,I'm trying to get solr 1.4up andrunning,with no
success. i have fllowed this
http://ubuntuforums.org/showthread.php?t=1532230 to run my solr,
but there has  error as:

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change: false in null
-
org.xml.sax.SAXParseException; lineNumber: 1036; columnNumber: 2; The markup
in the document following the root element must be well-formed. at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:253)
at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:288)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at
org.apache.solr.core.Config.(Config.java:110) at
org.apache.solr.core.SolrConfig.(SolrConfig.java:130) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4001)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4651)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
org.apache.catalina.core.StandardHost.start(StandardHost.java:785) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:445) at
org.apache.catalina.core.StandardService.start(StandardService.java:519) at
org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
org.apache.catalina.startup.Catalina.start(Catalina.java:581) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 


any body know how to do?thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Status-500-Severe-errors-in-solr-configuration-change-abortOnConfigurationError-false-abortOnCo6-tp2757494p2757494.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: HTTP Status 500 - Severe errors in solr configuration change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null ------------------------------------------------------------- org

2011-03-31 Thread jim
i need help

--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Status-500-Severe-errors-in-solr-configuration-change-abortOnConfigurationError-false-abortOnCo6-tp2757494p2761679.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: HTTP Status 500 - Severe errors in solr configuration change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null ------------------------------------------------------------- org

2011-03-31 Thread jim
Thanks . I'll check it. but i don't know how to get the right solr
configuration

--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Status-500-Severe-errors-in-solr-configuration-change-abortOnConfigurationError-false-abortOnCo6-tp2757493p2761753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: HTTP Status 500 - Severe errors in solr configuration change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null ------------------------------------------------------------- org

2011-03-31 Thread jim
I open it  buy Firefox,and find the mistake. when correct the mistake. There
still has the error as:
HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change: false in null
-
org.apache.solr.common.SolrException: invalid boolean value: at
org.apache.solr.common.util.StrUtils.parseBool(StrUtils.java:237) at
org.apache.solr.common.util.DOMUtil.addToNamedList(DOMUtil.java:140) at
org.apache.solr.common.util.DOMUtil.nodesToNamedList(DOMUtil.java:98) at
org.apache.solr.common.util.DOMUtil.childNodesToNamedList(DOMUtil.java:88)
at org.apache.solr.common.util.DOMUtil.addToNamedList(DOMUtil.java:142) at
org.apache.solr.common.util.DOMUtil.nodesToNamedList(DOMUtil.java:98) at
org.apache.solr.common.util.DOMUtil.childNodesToNamedList(DOMUtil.java:88)
at org.apache.solr.core.PluginInfo.(PluginInfo.java:54) at
org.apache.solr.core.SolrConfig.readPluginInfos(SolrConfig.java:220) at
org.apache.solr.core.SolrConfig.loadPluginInfo(SolrConfig.java:212) at
org.apache.solr.core.SolrConfig.(SolrConfig.java:184) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4001)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4651)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
org.apache.catalina.core.StandardHost.start(StandardHost.java:785) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:445) at
org.apache.catalina.core.StandardService.start(StandardService.java:519) at
org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
org.apache.catalina.startup.Catalina.start(Catalina.java:581) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 


i need help

--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Status-500-Severe-errors-in-solr-configuration-change-abortOnConfigurationError-false-abortOnCo6-tp2757494p2762082.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: general debugging techniques?

2010-07-06 Thread Jim Blomo
On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog goks...@gmail.com wrote:
 You don't need to optimize, only commit.

OK, thanks for the tip, Lance.  I thought the too many open files
problem was because I wasn't optimizing/merging frequently enough.  My
understanding of your suggestion is that commit also does merging, and
since I am only building the index, not querying or updating it, I
don't need to optimize.

 This means that the JVM spends 98% of its time doing garbage
 collection. This means there is not enough memory.

I'll increase the memory to 4G, decrease the documentCache to 5 and try again.

 I made a mistake - the bug in Lucene is not about PDFs - it happens
 with every field in every document you index in any way- so doing this
 in Tika outside Solr does not help. The only trick I can think of is
 to alternate between indexing large and small documents. This way the
 bug does not need memory for two giant documents in a row.

I've checked out and built solr from branch_3x with the
tika-0.8-SNAPSHOT patch.  (Earlier I was having trouble with Tika
crashing too frequently.)  I've confirmed that LUCENE-2387 is fixed in
this branch so hopefully I won't run into that this time.

 Also, do not query the indexer at all. If you must, don't do sorted or
 faceting requests. These eat up a lot of memory that is only freed
 with the next commit (index reload).

Good to know, though I have not been querying the index and definitely
haven't ventured into faceted requests yet.

The advice is much appreciated,

Jim


  1   2   >