Re: Query timeAllowed and its behavior.

2015-08-28 Thread Shawn Heisey
On 8/28/2015 10:47 PM, William Bell wrote:
> As we reported, we are having issues with timeAllowed on 5.2.1. If we set a
> timeAllowed=1 and then run the same query with timeAllowed=3 we get the
> # of rows that was returned on the first query.
> 
> It appears the results are cached when exceeding the timeAllowed, like the
> results are correct - when they are truncated.
> 
> SEEMS LIKE A BUG TO ME.

That sounds like a bug to me, too.

Is there any indication in the results the first time that the query was
aborted before it finished?  If Solr can detect that it aborted the
query, it should not be caching the results.

Thanks,
Shawn



Re: Query timeAllowed and its behavior.

2015-08-28 Thread William Bell
As we reported, we are having issues with timeAllowed on 5.2.1. If we set a
timeAllowed=1 and then run the same query with timeAllowed=3 we get the
# of rows that was returned on the first query.

It appears the results are cached when exceeding the timeAllowed, like the
results are correct - when they are truncated.

SEEMS LIKE A BUG TO ME.

On Tue, Aug 25, 2015 at 5:16 AM, Jonathon Marks (BLOOMBERG/ LONDON) <
jmark...@bloomberg.net> wrote:

> timeAllowed applies to the time taken by the collector in each shard
> (TimeLimitingCollector). Once timeAllowed is exceeded the collector
> terminates early, returning any partial results it has and freeing the
> resources it was using.
> From Solr 5.0 timeAllowed also applies to the query expansion phase and
> SolrClient request retry.
>
> From: solr-user@lucene.apache.org At: Aug 25 2015 10:18:07
> Subject: Re:Query timeAllowed and its behavior.
>
> Hi,
>
> Kindly help me understand the query time allowed attribute. The following
> is set in solrconfig.xml.
> 30
>
> Does this setting stop the query from running after the timeAllowed is
> reached? If not is there a way to stop it as it will occupy resources in
> background for no benefit.
>
> Thanks,
> Modassar
>
>
>


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Dynamic field rule plugin?

2015-08-28 Thread Hari Iyer
Hi,

I am new to Solr and am trying to create dynamic field rules in my Schema. 

I would like to use file name suffix to indicate other properties besides
the data type and multivalued as provided in the default schema. 

It appears that specifying this via a pattern leads to duplication as there
are various combinations that need to be specified here. It would help to
have code where I can build parts of the rule 

e.g. if suffix has '_s' then set stored=true

if suffix has '_m' then set multivalued=true

and so on

 

>From the documentation and various implementation examples (drupal etc) I
can only see them specifying all combinations.

Is there any way (plugin?) to incrementally build the rule?

Thanks,

Hari

 



RE: Data Import Handler Stays Idle

2015-08-28 Thread Allison, Timothy B.
> There are some zip files inside the directory and have been addressed 
> to in the database. I'm thinking those are the one's it's jumping 
> right over.

With SOLR-7189, which should have kicked in for 5.1, Tika shouldn't skip over 
Zip files, it should process all the contents of those zips and concatenate the 
extracted text into one string.


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Tuesday, July 21, 2015 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Data Import Handler Stays Idle

On 7/21/2015 8:17 AM, Paden wrote:
> There are some zip files inside the directory and have been addressed 
> to in the database. I'm thinking those are the one's it's jumping 
> right over. They are not the issue. At least I'm 95% sure. And Shawn 
> if you're still watching I'm sorry I'm using solr-5.1.0.

Have you started Solr with a larger heap than the default 512MB in Solr 5.x?  
Tika can require a lot of memory.  I would have expected there to be 
OutOfMemoryError exceptions in the log if that were the problem, though.

You may need to use the "-m" option on the startup scripts to increase the max 
heap.  Starting with "-m 2g" would be a good idea.

Also, seeing the entire multi-line IOException from the log (which may be 
dozens of lines) could be important.

Thanks,
Shawn



RE: Data Import Handler Stays Idle

2015-08-28 Thread Allison, Timothy B.
Only a month late to respond, and the response likely won't help.

I agree with Shawn that Tika can be a memory hog.  I try to leave 1GB per 
thread, but your mileage will vary dramatically depending on your docs.  I'd 
expect that you'd get an OOM, though, somewhere...

There have been rare bugs in various parsers, including the PDFParser, in 
various versions of Tika that cause permanent hangs.  I haven't experimented 
with DIH and known trigger files, but I suspect you'd get the behavior that 
you're seeing if this were to happen.

So, short of rolling your own ETL'r in lieu of DIH or hardening DIH to run tika 
in a different process (tika-server, perhaps -- 
https://issues.apache.org/jira/browse/SOLR-7632) or going big with Hadoop, 
morphlines, etc, your only hope is to upgrade Tika and hope that that was one 
of the bugs that we've already identified and fixed.

If you do go with morphlines...I don't think this has been fixed yet: 
https://github.com/kite-sdk/kite/issues/397

Did you ever figure out what was going wrong?

Best,

 Tim

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, July 21, 2015 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Data Import Handler Stays Idle

On 7/21/2015 8:17 AM, Paden wrote:
> There are some zip files inside the directory and have been addressed 
> to in the database. I'm thinking those are the one's it's jumping 
> right over. They are not the issue. At least I'm 95% sure. And Shawn 
> if you're still watching I'm sorry I'm using solr-5.1.0.

Have you started Solr with a larger heap than the default 512MB in Solr 5.x?  
Tika can require a lot of memory.  I would have expected there to be 
OutOfMemoryError exceptions in the log if that were the problem, though.

You may need to use the "-m" option on the startup scripts to increase the max 
heap.  Starting with "-m 2g" would be a good idea.

Also, seeing the entire multi-line IOException from the log (which may be 
dozens of lines) could be important.

Thanks,
Shawn



PingRequestHandler and file corruption

2015-08-28 Thread Davis, Daniel (NIH/NLM) [C]
This is a resend to correct my awful subject.

From: Davis, Daniel (NIH/NLM) [C]
Sent: Friday, August 28, 2015 2:15 PM
To: solr-user@lucene.apache.org
Subject: ping handler very doubtful

So, I tested that the PingRequestHandler works in the following fashion:

cd server/corename/data/index
# some work with ls and awk to produce a script, and then it runs
dd if=/dev/urandom of=`pwd`/segments_10 bs=160 count=1
dd if=/dev/urandom of=`pwd`/_u.fdt bs=41512 count=1
dd if=/dev/urandom of=`pwd`/_u.fdx bs=100 count=1
dd if=/dev/urandom of=`pwd`/_u.fnm bs=1573 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.doc bs=16824 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.pos bs=17677 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tim bs=54010 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tip bs=1466 count=1
dd if=/dev/urandom of=`pwd`/_u.nvd bs=966 count=1
dd if=/dev/urandom of=`pwd`/_u.nvm bs=140 count=1
dd if=/dev/urandom of=`pwd`/_u.si bs=409 count=1

This did not cause any immediate problems with the PingRequestHandler because 
the query was cached.
Worth a bug?

It did of course cause problems for the health monitor following RELOAD or 
complete solr restart; which was enough for me.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



ping handler very doubtful

2015-08-28 Thread Davis, Daniel (NIH/NLM) [C]
So, I tested that the PingRequestHandler works in the following fashion:

cd server/corename/data/index
# some work with ls and awk to produce a script, and then it runs
dd if=/dev/urandom of=`pwd`/segments_10 bs=160 count=1
dd if=/dev/urandom of=`pwd`/_u.fdt bs=41512 count=1
dd if=/dev/urandom of=`pwd`/_u.fdx bs=100 count=1
dd if=/dev/urandom of=`pwd`/_u.fnm bs=1573 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.doc bs=16824 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.pos bs=17677 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tim bs=54010 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tip bs=1466 count=1
dd if=/dev/urandom of=`pwd`/_u.nvd bs=966 count=1
dd if=/dev/urandom of=`pwd`/_u.nvm bs=140 count=1
dd if=/dev/urandom of=`pwd`/_u.si bs=409 count=1

This did not cause any immediate problems with the PingRequestHandler because 
the query was cached.
Worth a bug?

It did of course cause problems for the health monitor following RELOAD or 
complete solr restart; which was enough for me.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



Re: Indexing Fixed length file

2015-08-28 Thread Erik Hatcher
Ah yes, I should have made my example use tabs, though that currently would 
have required also adding “&separator=%09” to the params.

I definitely support the use of tabs for what they were intended, delimiting 
columns of data.  +1, thanks for that mention Alex






> On Aug 28, 2015, at 1:38 PM, Alexandre Rafalovitch  wrote:
> 
> Erik's version might be better with tabs though to avoid CSV's
> requirements on escaping comas, quotes, etc. And maybe trim those
> fields a bit either in awk or in URP inside Solr.
> 
> But it would definitely work.
> 
> Regards,
>   Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
> 
> 
> On 28 August 2015 at 12:39, Erik Hatcher  wrote:
>> How about this incantation:
>> 
>> $ bin/solr create -c fw
>> $ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
>> bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d
>> $ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv'
>> val,_version_,id
>> 36,1510767115252006912,Q
>> 
>> With a big bunch of data, the stdin detection of bin/post doesn’t work well 
>> so I’d certainly recommend going to an intermediate real file (awk... > 
>> data.csv ; bin/post … data.csv) instead.
>> 
>> 
>> —
>> Erik Hatcher, Senior Solutions Architect
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>>> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
>>> 
>>> Hello,
>>> 
>>> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
>>> they have a fixed length and no withespace to seperate the words.
>>> How can i Programm a Template or so for my fields?
>>> Or can i edit the schema.xml for my Problem?
>>> 
>>> This ist one record from one file, in this file are 40 - 100 records.
>>> 
>>> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
>>> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
>>> 130445
>>> 
>>> 
>>> Thanks!
>>> 
>>> Tim
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: Indexing Fixed length file

2015-08-28 Thread Alexandre Rafalovitch
Erik's version might be better with tabs though to avoid CSV's
requirements on escaping comas, quotes, etc. And maybe trim those
fields a bit either in awk or in URP inside Solr.

But it would definitely work.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 28 August 2015 at 12:39, Erik Hatcher  wrote:
> How about this incantation:
>
> $ bin/solr create -c fw
> $ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
> bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d
> $ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv'
> val,_version_,id
> 36,1510767115252006912,Q
>
> With a big bunch of data, the stdin detection of bin/post doesn’t work well 
> so I’d certainly recommend going to an intermediate real file (awk... > 
> data.csv ; bin/post … data.csv) instead.
>
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com
>
>
>
>
>> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
>>
>> Hello,
>>
>> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
>> they have a fixed length and no withespace to seperate the words.
>> How can i Programm a Template or so for my fields?
>> Or can i edit the schema.xml for my Problem?
>>
>> This ist one record from one file, in this file are 40 - 100 records.
>>
>> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
>> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
>> 130445
>>
>>
>> Thanks!
>>
>> Tim
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Sorting by function

2015-08-28 Thread Philippe Soares
Thanks Chris ! I have the country as a single valued field so your solution
works perfectly !

On Fri, Aug 28, 2015 at 1:22 PM, Chris Hostetter 
wrote:

>
> : I have a "country" field in my index, with values like 'US', 'FR', 'UK',
> : etc...
> :
> : Then I want our users to be able to define the order of their preferred
> : countries so that grouped results are sorted according to their
> preference.
> ...
> : Is there any other function that would allow me to map from a predefined
> : String constant into an Integer that I can sort on ?
>
> Because of how they evolved, and most of the common usecases for them,
> there aren't a lot of functions that operate on "strings".
>
> Assuming your "country" field is a single valued (indexed) string field,
> then what you want can be done fairly simply using the the "termfreq()"
> function.
>
> termfreq(country,US) will return the (raw integer) term frequency for
> "Term(country,US)" for each doc -- assuming it's single valued (and not
> tokenized) that means for every doc it will be either a 0 or a 1.
>
> so you can either modify your earlier attempt at using "map" on the string
> values to do a map over the termfreq output, or you can simplify things to
> just multiply take the max value -- where max is just a short hand for
> "the non 0 value" ...
>
> max(mul(9,termfreq(country,US)),
> mul(8,termfreq(country,FR)),
> mul(7,termfreq(country,UK)),
> ...)
>
> Things get more interesting/complicated if the field isn't single valued,
> or is tokenized -- then individual values (like "US") might have a
> termfreq that is greater then 1, or a doc might have more then one value,
> and you have to decide what kind of math operation you want to apply over
> those...
>
>   * ignore termfreqs and ony look at if term exists?
> - wrap each termfreq in map to force value to either 0 or 1
>   * want to sort by sum of (weights * termfreq) for each term?
> - change max to sum in above example
>   * ignore all but the "main" term that has hte highest freq for each doc?
> - not easy at query time - best to figure out the "main" term at index
>   time and put in it's own field.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Sorting by function

2015-08-28 Thread Chris Hostetter

: I have a "country" field in my index, with values like 'US', 'FR', 'UK',
: etc...
: 
: Then I want our users to be able to define the order of their preferred
: countries so that grouped results are sorted according to their preference.
...
: Is there any other function that would allow me to map from a predefined
: String constant into an Integer that I can sort on ?

Because of how they evolved, and most of the common usecases for them, 
there aren't a lot of functions that operate on "strings".

Assuming your "country" field is a single valued (indexed) string field, 
then what you want can be done fairly simply using the the "termfreq()" 
function.

termfreq(country,US) will return the (raw integer) term frequency for 
"Term(country,US)" for each doc -- assuming it's single valued (and not 
tokenized) that means for every doc it will be either a 0 or a 1.

so you can either modify your earlier attempt at using "map" on the string 
values to do a map over the termfreq output, or you can simplify things to 
just multiply take the max value -- where max is just a short hand for 
"the non 0 value" ...

max(mul(9,termfreq(country,US)),
mul(8,termfreq(country,FR)),
mul(7,termfreq(country,UK)),
...)

Things get more interesting/complicated if the field isn't single valued, 
or is tokenized -- then individual values (like "US") might have a 
termfreq that is greater then 1, or a doc might have more then one value, 
and you have to decide what kind of math operation you want to apply over 
those...

  * ignore termfreqs and ony look at if term exists? 
- wrap each termfreq in map to force value to either 0 or 1
  * want to sort by sum of (weights * termfreq) for each term?
- change max to sum in above example
  * ignore all but the "main" term that has hte highest freq for each doc?
- not easy at query time - best to figure out the "main" term at index 
  time and put in it's own field.


-Hoss
http://www.lucidworks.com/


Sorting by function

2015-08-28 Thread Philippe Soares
Hi,
I'm trying to apply the "Sort by function
" solr
capabilities to solve the following use case :

I have a "country" field in my index, with values like 'US', 'FR', 'UK',
etc...

Then I want our users to be able to define the order of their preferred
countries so that grouped results are sorted according to their preference.

I need something like the map function, that assigns a number to each
country code and use that for sorting, based on the users' preference.

I tried to sort my groups by adding something like map(country, 'FR', 'FR',
1) to the field list, but map seems to only work for numerical values. I
get errors like :

Error parsing fieldname: Expected float instead of quoted string:FR

Is there any other function that would allow me to map from a predefined
String constant into an Integer that I can sort on ?

Thanks in advance.


Re: Indexing Fixed length file

2015-08-28 Thread Erik Hatcher
How about this incantation:

$ bin/solr create -c fw
$ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d
$ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv'
val,_version_,id
36,1510767115252006912,Q

With a big bunch of data, the stdin detection of bin/post doesn’t work well so 
I’d certainly recommend going to an intermediate real file (awk... > data.csv ; 
bin/post … data.csv) instead.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com




> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
> 
> Hello,
> 
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words. 
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
> 
> This ist one record from one file, in this file are 40 - 100 records.
> 
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
> 130445 
> 
> 
> Thanks! 
> 
> Tim
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing Fixed length file

2015-08-28 Thread Alexandre Rafalovitch
If you use DataImportHandler, you can combine LineEntityProcessor with
RegexTransformer to split each line into a bunch of fields:
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheRegexTransformer

You could then trim the whitespace in the UpdateRequestProcessor chain
that you can setup to run after DIH and use TrimFieldUpdate URP
http://www.solr-start.com/info/update-request-processors/#TrimFieldUpdateProcessorFactory

I think this should do the job. With bin/post, you could setup a
custom URP chain as well, but it does not have an equivalent of
RegexTransformer that splits into multiple other fields. Not that it
would be hard to write one, just nobody did yet.

Regards,
   Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 28 August 2015 at 03:19, timmsn  wrote:
> Hello,
>
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words.
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
>
> This ist one record from one file, in this file are 40 - 100 records.
>
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
> 130445
>
>
> Thanks!
>
> Tim
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Fixed length file

2015-08-28 Thread Steve Rowe
Hi Tim,

I haven’t heard of people indexing this kind of input with Solr, but the format 
is quite similar to CSV/TSV files, with the exception that the field separators 
have fixed positions and are omitted.

You could write a short script to insert separators (e.g. commas) at these 
points (but be sure to escape quotation marks and the separators) and then use 
Solr’s CSV update functionality: 
.

I think dealing with fixed-width fields directly would be a nice addition to 
Solr’s CSV update capabilities - feel free to make an issue - see 
.

Steve
www.lucidworks.com

> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
> 
> Hello,
> 
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words. 
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
> 
> This ist one record from one file, in this file are 40 - 100 records.
> 
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
> 130445 
> 
> 
> Thanks! 
> 
> Tim
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing Fixed length file

2015-08-28 Thread Erick Erickson
Solr doesn't know anything about such a file. The post program expects
well-defined structures, see the xml and json formats in example/exampledocs.

So you either have to transform the data into the form expected by the bin/post
tool or perhaps you can use the CSV import, see:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates

Best,
Erick

On Fri, Aug 28, 2015 at 12:19 AM, timmsn  wrote:
> Hello,
>
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words.
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
>
> This ist one record from one file, in this file are 40 - 100 records.
>
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
> 130445
>
>
> Thanks!
>
> Tim
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.2: Same Document in Multiple Shard

2015-08-28 Thread Erick Erickson
Have you done anything special in terms of routing or are you using
the default compositeId? How are you indexing? Docs are considered
"identical" in Solr based solely on the  field. If that's
the absolute same (possibly including extra whitespace) then this
shouldn't be happening, nobody else has reported this, so I suspect
there's something about your setup that's odd.

The clusterstate for the collection would be interesting to see, as
well as your schema definition for your ID field.

Best,
Erick

On Fri, Aug 28, 2015 at 12:52 AM, Maulin Rathod  wrote:
> We have recently upgraded solr from 4.8 to 5.2. We have 2 shard and 2 replica 
> in solr cloud. It shows correctly in SolrCloud via Solr Admin Panel.
>
> We found that sometimes same document is available in both the shards. We 
> confirmed via querying individual shard (from solr admin by passing shards 
> parameter).
>
> Can it be due to some configuration issue? How can we fix it?
>
> -Maulin
>
> [CC Award Winners 2014]
>


Re: solrcloud and core swapping

2015-08-28 Thread Shawn Heisey
On 8/28/2015 8:25 AM, Shawn Heisey wrote:
> Instead, use collection aliasing. Create collections named something
> like foo_0 and foo_1, and update the alias "foo" to point to whichever
> of them is currently live. Your queries and update requests will never
> need to know about foo_0 and foo_1 ... only the coordinating part of
> your system, where you would normally do your core swapping, needs to
> know about those. 

You might also want to have a foo_build alias pointing to the *other*
collection for any "full rebuild" functionality, so it can also use a
static collection name.

Thanks,
Shawn



Re: solrcloud and core swapping

2015-08-28 Thread Shawn Heisey
On 8/28/2015 8:10 AM, Bill Au wrote:
> Is core swapping supported in SolrCloud?  If I have a 5 nodes SolrCloud
> cluster and I do a core swap on the leader, will the core be swapped on the
> other 4 nodes as well?  Or do I need to do a core swap on each node?

When you're running SolrCloud, swapping any of the cores might really
screw things up.  I think it might be a good idea for Solr to return a
"not supported in cloud mode" failure on certain CoreAdmin actions.

Instead, use collection aliasing.  Create collections named something
like foo_0 and foo_1, and update the alias "foo" to point to whichever
of them is currently live.  Your queries and update requests will never
need to know about foo_0 and foo_1 ... only the coordinating part of
your system, where you would normally do your core swapping, needs to
know about those.

Thanks,
Shawn



solrcloud and core swapping

2015-08-28 Thread Bill Au
Is core swapping supported in SolrCloud?  If I have a 5 nodes SolrCloud
cluster and I do a core swap on the leader, will the core be swapped on the
other 4 nodes as well?  Or do I need to do a core swap on each node?

Bill


Re: What is the correct path for mysql jdbc connector on Solr?

2015-08-28 Thread Shawn Heisey
On 8/28/2015 6:18 AM, Merlin Morgenstern wrote:
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
> load driver: com.mysql.jdbc.Driver Processing Document # 1
> 
> How many directories do I have to go up inside the config "../... " ?

This is the way I always recommend dealing with extra required jars:

Remove all  directives from solrconfig.xml.
On each server, create a lib directory in the solr home.
(the solr home is where solr.xml lives)
Copy all required extra jars to that lib directory.

There is currently a problem with this approach when using the Lucene
ICU analysis components.  You must use the full class name instead of
something like "solr.ICUFoldingFilterFactory".  This doesn't seem to
affect any classes other than the ICU analysis components.

https://issues.apache.org/jira/browse/SOLR-6188

Thanks,
Shawn



What is the correct path for mysql jdbc connector on Solr?

2015-08-28 Thread Merlin Morgenstern
I have solrcloud installation running on 3 machines where I would like to
import data from mysql. Unfortunatelly the import failes due to the missing
jdbc connector.

My guess is, that I am having trouble with the right directory.

solrconfig.xml:

  

file location:

node1:/opt/solr-5.2.1/dist/mysql-connector-java-5.1.36-bin.jar

error message:


Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.mysql.jdbc.Driver Processing Document # 1

How many directories do I have to go up inside the config "../... " ?

The config is uploaded OK within zookeeper and solr has been restarted.

Thank you for any help on this!


RE: "no default request handler is registered"

2015-08-28 Thread Scott Hollenbeck
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Thursday, August 27, 2015 3:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: "no default request handler is registered"
> 
> On 8/27/2015 1:10 PM, Scott Hollenbeck wrote:
> > I'm doing some experimenting with Solr 5.3 and the 7.x-1.x-dev version of
> > the Apache Solr Search module for Drupal. Things seem to be working fine,
> > except that this warning message appears in the Solr admin logging window
> > and in the server log:
> >
> > "no default request handler is registered (either '/select' or 'standard')"
> >
> > Looking at the solrconfig.xml file that comes with the Drupal module I see a
> > requestHandler named "standard":
> >
> >   
> >  
> >content
> >explicit
> >true
> >  
> >   
> >
> > I also see a handler named pinkPony with a "default" attribute set to
> > "true":
> 
> 
> 
> > So it seems like there are both standard and default requestHandlers
> > specified. Why is the warning produced? What am I missing?
> 
> I think the warning message may be misworded, or logged in incorrect
> circumstances, and might need some attention.
> 
> The solrconfig.xml that you are using (which I assume came from the
> Drupal project) is geared towards a 3.x version of Solr prior to 3.6.x
> (the last minor version in the 3.x line).
> 
> Starting in the 3.6 version, all request handlers in examples have names
> that start with a forward slash, like "/select", none of them have the
> "default" attribute, and the handleSelect parameter found elsewhere in
> the solrconfig.xml is false.
> 
> You should bring this up with the Drupal folks and ask them to upgrade
> their config/schema and their code for modern versions of Solr.  Solr
> 3.6.0 (which deprecated their handler naming convention and the
> "default" attribute) was released over three years ago.

Thanks for the replies. The config files I'm using came from a Drupal sandbox 
project that's focused on Solr 5.x compatibility. I've added an issue to that 
project's queue. We'll see how it goes.

Scott Hollenbeck



Solr 5.2: Same Document in Multiple Shard

2015-08-28 Thread Maulin Rathod
We have recently upgraded solr from 4.8 to 5.2. We have 2 shard and 2 replica 
in solr cloud. It shows correctly in SolrCloud via Solr Admin Panel.

We found that sometimes same document is available in both the shards. We 
confirmed via querying individual shard (from solr admin by passing shards 
parameter).

Can it be due to some configuration issue? How can we fix it?

-Maulin

[CC Award Winners 2014]



Indexing Fixed length file

2015-08-28 Thread timmsn
Hello,

i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
they have a fixed length and no withespace to seperate the words. 
How can i Programm a Template or so for my fields?
Or can i edit the schema.xml for my Problem?

This ist one record from one file, in this file are 40 - 100 records.

AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
130445 


Thanks! 

Tim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
Sent from the Solr - User mailing list archive at Nabble.com.