Solr 5.2: Same Document in Multiple Shard

2015-08-28 Thread Maulin Rathod
We have recently upgraded solr from 4.8 to 5.2. We have 2 shard and 2 replica 
in solr cloud. It shows correctly in SolrCloud via Solr Admin Panel.

We found that sometimes same document is available in both the shards. We 
confirmed via querying individual shard (from solr admin by passing shards 
parameter).

Can it be due to some configuration issue? How can we fix it?

-Maulin

[CC Award Winners 2014]



Indexing Fixed length file

2015-08-28 Thread timmsn
Hello,

i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
they have a fixed length and no withespace to seperate the words. 
How can i Programm a Template or so for my fields?
Or can i edit the schema.xml for my Problem?

This ist one record from one file, in this file are 40 - 100 records.

AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
130445 


Thanks! 

Tim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: no default request handler is registered

2015-08-28 Thread Scott Hollenbeck
 -Original Message-
 From: Shawn Heisey [mailto:apa...@elyograg.org]
 Sent: Thursday, August 27, 2015 3:51 PM
 To: solr-user@lucene.apache.org
 Subject: Re: no default request handler is registered
 
 On 8/27/2015 1:10 PM, Scott Hollenbeck wrote:
  I'm doing some experimenting with Solr 5.3 and the 7.x-1.x-dev version of
  the Apache Solr Search module for Drupal. Things seem to be working fine,
  except that this warning message appears in the Solr admin logging window
  and in the server log:
 
  no default request handler is registered (either '/select' or 'standard')
 
  Looking at the solrconfig.xml file that comes with the Drupal module I see a
  requestHandler named standard:
 
requestHandler name=standard class=solr.SearchHandler
   lst name=defaults
 str name=dfcontent/str
 str name=echoParamsexplicit/str
 bool name=omitHeadertrue/bool
   /lst
/requestHandler
 
  I also see a handler named pinkPony with a default attribute set to
  true:
 
 snip
 
  So it seems like there are both standard and default requestHandlers
  specified. Why is the warning produced? What am I missing?
 
 I think the warning message may be misworded, or logged in incorrect
 circumstances, and might need some attention.
 
 The solrconfig.xml that you are using (which I assume came from the
 Drupal project) is geared towards a 3.x version of Solr prior to 3.6.x
 (the last minor version in the 3.x line).
 
 Starting in the 3.6 version, all request handlers in examples have names
 that start with a forward slash, like /select, none of them have the
 default attribute, and the handleSelect parameter found elsewhere in
 the solrconfig.xml is false.
 
 You should bring this up with the Drupal folks and ask them to upgrade
 their config/schema and their code for modern versions of Solr.  Solr
 3.6.0 (which deprecated their handler naming convention and the
 default attribute) was released over three years ago.

Thanks for the replies. The config files I'm using came from a Drupal sandbox 
project that's focused on Solr 5.x compatibility. I've added an issue to that 
project's queue. We'll see how it goes.

Scott Hollenbeck



Re: What is the correct path for mysql jdbc connector on Solr?

2015-08-28 Thread Shawn Heisey
On 8/28/2015 6:18 AM, Merlin Morgenstern wrote:
 Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
 load driver: com.mysql.jdbc.Driver Processing Document # 1
 
 How many directories do I have to go up inside the config ../...  ?

This is the way I always recommend dealing with extra required jars:

Remove all lib directives from solrconfig.xml.
On each server, create a lib directory in the solr home.
(the solr home is where solr.xml lives)
Copy all required extra jars to that lib directory.

There is currently a problem with this approach when using the Lucene
ICU analysis components.  You must use the full class name instead of
something like solr.ICUFoldingFilterFactory.  This doesn't seem to
affect any classes other than the ICU analysis components.

https://issues.apache.org/jira/browse/SOLR-6188

Thanks,
Shawn



What is the correct path for mysql jdbc connector on Solr?

2015-08-28 Thread Merlin Morgenstern
I have solrcloud installation running on 3 machines where I would like to
import data from mysql. Unfortunatelly the import failes due to the missing
jdbc connector.

My guess is, that I am having trouble with the right directory.

solrconfig.xml:

  lib dir=${solr.install.dir:../../..}/dist/
regex=solr-dataimporthandler-.*\.jar /

file location:

node1:/opt/solr-5.2.1/dist/mysql-connector-java-5.1.36-bin.jar

error message:


Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.mysql.jdbc.Driver Processing Document # 1

How many directories do I have to go up inside the config ../...  ?

The config is uploaded OK within zookeeper and solr has been restarted.

Thank you for any help on this!


solrcloud and core swapping

2015-08-28 Thread Bill Au
Is core swapping supported in SolrCloud?  If I have a 5 nodes SolrCloud
cluster and I do a core swap on the leader, will the core be swapped on the
other 4 nodes as well?  Or do I need to do a core swap on each node?

Bill


Re: Indexing Fixed length file

2015-08-28 Thread Steve Rowe
Hi Tim,

I haven’t heard of people indexing this kind of input with Solr, but the format 
is quite similar to CSV/TSV files, with the exception that the field separators 
have fixed positions and are omitted.

You could write a short script to insert separators (e.g. commas) at these 
points (but be sure to escape quotation marks and the separators) and then use 
Solr’s CSV update functionality: 
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates.

I think dealing with fixed-width fields directly would be a nice addition to 
Solr’s CSV update capabilities - feel free to make an issue - see 
http://wiki.apache.org/solr/HowToContribute.

Steve
www.lucidworks.com

 On Aug 28, 2015, at 3:19 AM, timmsn tim.hammac...@web.de wrote:
 
 Hello,
 
 i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
 they have a fixed length and no withespace to seperate the words. 
 How can i Programm a Template or so for my fields?
 Or can i edit the schema.xml for my Problem?
 
 This ist one record from one file, in this file are 40 - 100 records.
 
 AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
 MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
 130445 
 
 
 Thanks! 
 
 Tim
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: solrcloud and core swapping

2015-08-28 Thread Shawn Heisey
On 8/28/2015 8:10 AM, Bill Au wrote:
 Is core swapping supported in SolrCloud?  If I have a 5 nodes SolrCloud
 cluster and I do a core swap on the leader, will the core be swapped on the
 other 4 nodes as well?  Or do I need to do a core swap on each node?

When you're running SolrCloud, swapping any of the cores might really
screw things up.  I think it might be a good idea for Solr to return a
not supported in cloud mode failure on certain CoreAdmin actions.

Instead, use collection aliasing.  Create collections named something
like foo_0 and foo_1, and update the alias foo to point to whichever
of them is currently live.  Your queries and update requests will never
need to know about foo_0 and foo_1 ... only the coordinating part of
your system, where you would normally do your core swapping, needs to
know about those.

Thanks,
Shawn



Re: Indexing Fixed length file

2015-08-28 Thread Erick Erickson
Solr doesn't know anything about such a file. The post program expects
well-defined structures, see the xml and json formats in example/exampledocs.

So you either have to transform the data into the form expected by the bin/post
tool or perhaps you can use the CSV import, see:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates

Best,
Erick

On Fri, Aug 28, 2015 at 12:19 AM, timmsn tim.hammac...@web.de wrote:
 Hello,

 i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
 they have a fixed length and no withespace to seperate the words.
 How can i Programm a Template or so for my fields?
 Or can i edit the schema.xml for my Problem?

 This ist one record from one file, in this file are 40 - 100 records.

 AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
 MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
 130445


 Thanks!

 Tim



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.2: Same Document in Multiple Shard

2015-08-28 Thread Erick Erickson
Have you done anything special in terms of routing or are you using
the default compositeId? How are you indexing? Docs are considered
identical in Solr based solely on the uniqueKey field. If that's
the absolute same (possibly including extra whitespace) then this
shouldn't be happening, nobody else has reported this, so I suspect
there's something about your setup that's odd.

The clusterstate for the collection would be interesting to see, as
well as your schema definition for your ID field.

Best,
Erick

On Fri, Aug 28, 2015 at 12:52 AM, Maulin Rathod mrat...@asite.com wrote:
 We have recently upgraded solr from 4.8 to 5.2. We have 2 shard and 2 replica 
 in solr cloud. It shows correctly in SolrCloud via Solr Admin Panel.

 We found that sometimes same document is available in both the shards. We 
 confirmed via querying individual shard (from solr admin by passing shards 
 parameter).

 Can it be due to some configuration issue? How can we fix it?

 -Maulin

 [CC Award Winners 2014]



Re: solrcloud and core swapping

2015-08-28 Thread Shawn Heisey
On 8/28/2015 8:25 AM, Shawn Heisey wrote:
 Instead, use collection aliasing. Create collections named something
 like foo_0 and foo_1, and update the alias foo to point to whichever
 of them is currently live. Your queries and update requests will never
 need to know about foo_0 and foo_1 ... only the coordinating part of
 your system, where you would normally do your core swapping, needs to
 know about those. 

You might also want to have a foo_build alias pointing to the *other*
collection for any full rebuild functionality, so it can also use a
static collection name.

Thanks,
Shawn



Re: Indexing Fixed length file

2015-08-28 Thread Alexandre Rafalovitch
If you use DataImportHandler, you can combine LineEntityProcessor with
RegexTransformer to split each line into a bunch of fields:
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheRegexTransformer

You could then trim the whitespace in the UpdateRequestProcessor chain
that you can setup to run after DIH and use TrimFieldUpdate URP
http://www.solr-start.com/info/update-request-processors/#TrimFieldUpdateProcessorFactory

I think this should do the job. With bin/post, you could setup a
custom URP chain as well, but it does not have an equivalent of
RegexTransformer that splits into multiple other fields. Not that it
would be hard to write one, just nobody did yet.

Regards,
   Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 28 August 2015 at 03:19, timmsn tim.hammac...@web.de wrote:
 Hello,

 i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
 they have a fixed length and no withespace to seperate the words.
 How can i Programm a Template or so for my fields?
 Or can i edit the schema.xml for my Problem?

 This ist one record from one file, in this file are 40 - 100 records.

 AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
 MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
 130445


 Thanks!

 Tim



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query timeAllowed and its behavior.

2015-08-28 Thread William Bell
As we reported, we are having issues with timeAllowed on 5.2.1. If we set a
timeAllowed=1 and then run the same query with timeAllowed=3 we get the
# of rows that was returned on the first query.

It appears the results are cached when exceeding the timeAllowed, like the
results are correct - when they are truncated.

SEEMS LIKE A BUG TO ME.

On Tue, Aug 25, 2015 at 5:16 AM, Jonathon Marks (BLOOMBERG/ LONDON) 
jmark...@bloomberg.net wrote:

 timeAllowed applies to the time taken by the collector in each shard
 (TimeLimitingCollector). Once timeAllowed is exceeded the collector
 terminates early, returning any partial results it has and freeing the
 resources it was using.
 From Solr 5.0 timeAllowed also applies to the query expansion phase and
 SolrClient request retry.

 From: solr-user@lucene.apache.org At: Aug 25 2015 10:18:07
 Subject: Re:Query timeAllowed and its behavior.

 Hi,

 Kindly help me understand the query time allowed attribute. The following
 is set in solrconfig.xml.
 int name=timeAllowed30/int

 Does this setting stop the query from running after the timeAllowed is
 reached? If not is there a way to stop it as it will occupy resources in
 background for no benefit.

 Thanks,
 Modassar





-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Query timeAllowed and its behavior.

2015-08-28 Thread Shawn Heisey
On 8/28/2015 10:47 PM, William Bell wrote:
 As we reported, we are having issues with timeAllowed on 5.2.1. If we set a
 timeAllowed=1 and then run the same query with timeAllowed=3 we get the
 # of rows that was returned on the first query.
 
 It appears the results are cached when exceeding the timeAllowed, like the
 results are correct - when they are truncated.
 
 SEEMS LIKE A BUG TO ME.

That sounds like a bug to me, too.

Is there any indication in the results the first time that the query was
aborted before it finished?  If Solr can detect that it aborted the
query, it should not be caching the results.

Thanks,
Shawn



Re: Indexing Fixed length file

2015-08-28 Thread Erik Hatcher
How about this incantation:

$ bin/solr create -c fw
$ echo Q36 | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
bin/post -c fw -params fieldnames=id,valheader=false -type text/csv -d
$ curl 'http://localhost:8983/solr/fw/select?q=*:*wt=csv'
val,_version_,id
36,1510767115252006912,Q

With a big bunch of data, the stdin detection of bin/post doesn’t work well so 
I’d certainly recommend going to an intermediate real file (awk...  data.csv ; 
bin/post … data.csv) instead.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com




 On Aug 28, 2015, at 3:19 AM, timmsn tim.hammac...@web.de wrote:
 
 Hello,
 
 i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
 they have a fixed length and no withespace to seperate the words. 
 How can i Programm a Template or so for my fields?
 Or can i edit the schema.xml for my Problem?
 
 This ist one record from one file, in this file are 40 - 100 records.
 
 AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
 MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
 130445 
 
 
 Thanks! 
 
 Tim
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Sorting by function

2015-08-28 Thread Philippe Soares
Hi,
I'm trying to apply the Sort by function
https://wiki.apache.org/solr/FunctionQuery#Sort_By_Function solr
capabilities to solve the following use case :

I have a country field in my index, with values like 'US', 'FR', 'UK',
etc...

Then I want our users to be able to define the order of their preferred
countries so that grouped results are sorted according to their preference.

I need something like the map function, that assigns a number to each
country code and use that for sorting, based on the users' preference.

I tried to sort my groups by adding something like map(country, 'FR', 'FR',
1) to the field list, but map seems to only work for numerical values. I
get errors like :

Error parsing fieldname: Expected float instead of quoted string:FR

Is there any other function that would allow me to map from a predefined
String constant into an Integer that I can sort on ?

Thanks in advance.


Re: Sorting by function

2015-08-28 Thread Chris Hostetter

: I have a country field in my index, with values like 'US', 'FR', 'UK',
: etc...
: 
: Then I want our users to be able to define the order of their preferred
: countries so that grouped results are sorted according to their preference.
...
: Is there any other function that would allow me to map from a predefined
: String constant into an Integer that I can sort on ?

Because of how they evolved, and most of the common usecases for them, 
there aren't a lot of functions that operate on strings.

Assuming your country field is a single valued (indexed) string field, 
then what you want can be done fairly simply using the the termfreq() 
function.

termfreq(country,US) will return the (raw integer) term frequency for 
Term(country,US) for each doc -- assuming it's single valued (and not 
tokenized) that means for every doc it will be either a 0 or a 1.

so you can either modify your earlier attempt at using map on the string 
values to do a map over the termfreq output, or you can simplify things to 
just multiply take the max value -- where max is just a short hand for 
the non 0 value ...

max(mul(9,termfreq(country,US)),
mul(8,termfreq(country,FR)),
mul(7,termfreq(country,UK)),
...)

Things get more interesting/complicated if the field isn't single valued, 
or is tokenized -- then individual values (like US) might have a 
termfreq that is greater then 1, or a doc might have more then one value, 
and you have to decide what kind of math operation you want to apply over 
those...

  * ignore termfreqs and ony look at if term exists? 
- wrap each termfreq in map to force value to either 0 or 1
  * want to sort by sum of (weights * termfreq) for each term?
- change max to sum in above example
  * ignore all but the main term that has hte highest freq for each doc?
- not easy at query time - best to figure out the main term at index 
  time and put in it's own field.


-Hoss
http://www.lucidworks.com/


Re: Sorting by function

2015-08-28 Thread Philippe Soares
Thanks Chris ! I have the country as a single valued field so your solution
works perfectly !

On Fri, Aug 28, 2015 at 1:22 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : I have a country field in my index, with values like 'US', 'FR', 'UK',
 : etc...
 :
 : Then I want our users to be able to define the order of their preferred
 : countries so that grouped results are sorted according to their
 preference.
 ...
 : Is there any other function that would allow me to map from a predefined
 : String constant into an Integer that I can sort on ?

 Because of how they evolved, and most of the common usecases for them,
 there aren't a lot of functions that operate on strings.

 Assuming your country field is a single valued (indexed) string field,
 then what you want can be done fairly simply using the the termfreq()
 function.

 termfreq(country,US) will return the (raw integer) term frequency for
 Term(country,US) for each doc -- assuming it's single valued (and not
 tokenized) that means for every doc it will be either a 0 or a 1.

 so you can either modify your earlier attempt at using map on the string
 values to do a map over the termfreq output, or you can simplify things to
 just multiply take the max value -- where max is just a short hand for
 the non 0 value ...

 max(mul(9,termfreq(country,US)),
 mul(8,termfreq(country,FR)),
 mul(7,termfreq(country,UK)),
 ...)

 Things get more interesting/complicated if the field isn't single valued,
 or is tokenized -- then individual values (like US) might have a
 termfreq that is greater then 1, or a doc might have more then one value,
 and you have to decide what kind of math operation you want to apply over
 those...

   * ignore termfreqs and ony look at if term exists?
 - wrap each termfreq in map to force value to either 0 or 1
   * want to sort by sum of (weights * termfreq) for each term?
 - change max to sum in above example
   * ignore all but the main term that has hte highest freq for each doc?
 - not easy at query time - best to figure out the main term at index
   time and put in it's own field.


 -Hoss
 http://www.lucidworks.com/



Re: Indexing Fixed length file

2015-08-28 Thread Alexandre Rafalovitch
Erik's version might be better with tabs though to avoid CSV's
requirements on escaping comas, quotes, etc. And maybe trim those
fields a bit either in awk or in URP inside Solr.

But it would definitely work.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 28 August 2015 at 12:39, Erik Hatcher erik.hatc...@gmail.com wrote:
 How about this incantation:

 $ bin/solr create -c fw
 $ echo Q36 | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
 bin/post -c fw -params fieldnames=id,valheader=false -type text/csv -d
 $ curl 'http://localhost:8983/solr/fw/select?q=*:*wt=csv'
 val,_version_,id
 36,1510767115252006912,Q

 With a big bunch of data, the stdin detection of bin/post doesn’t work well 
 so I’d certainly recommend going to an intermediate real file (awk...  
 data.csv ; bin/post … data.csv) instead.


 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com




 On Aug 28, 2015, at 3:19 AM, timmsn tim.hammac...@web.de wrote:

 Hello,

 i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
 they have a fixed length and no withespace to seperate the words.
 How can i Programm a Template or so for my fields?
 Or can i edit the schema.xml for my Problem?

 This ist one record from one file, in this file are 40 - 100 records.

 AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
 MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
 130445


 Thanks!

 Tim



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
 Sent from the Solr - User mailing list archive at Nabble.com.



ping handler very doubtful

2015-08-28 Thread Davis, Daniel (NIH/NLM) [C]
So, I tested that the PingRequestHandler works in the following fashion:

cd server/corename/data/index
# some work with ls and awk to produce a script, and then it runs
dd if=/dev/urandom of=`pwd`/segments_10 bs=160 count=1
dd if=/dev/urandom of=`pwd`/_u.fdt bs=41512 count=1
dd if=/dev/urandom of=`pwd`/_u.fdx bs=100 count=1
dd if=/dev/urandom of=`pwd`/_u.fnm bs=1573 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.doc bs=16824 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.pos bs=17677 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tim bs=54010 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tip bs=1466 count=1
dd if=/dev/urandom of=`pwd`/_u.nvd bs=966 count=1
dd if=/dev/urandom of=`pwd`/_u.nvm bs=140 count=1
dd if=/dev/urandom of=`pwd`/_u.si bs=409 count=1

This did not cause any immediate problems with the PingRequestHandler because 
the query was cached.
Worth a bug?

It did of course cause problems for the health monitor following RELOAD or 
complete solr restart; which was enough for me.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



Re: Indexing Fixed length file

2015-08-28 Thread Erik Hatcher
Ah yes, I should have made my example use tabs, though that currently would 
have required also adding “separator=%09” to the params.

I definitely support the use of tabs for what they were intended, delimiting 
columns of data.  +1, thanks for that mention Alex






 On Aug 28, 2015, at 1:38 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:
 
 Erik's version might be better with tabs though to avoid CSV's
 requirements on escaping comas, quotes, etc. And maybe trim those
 fields a bit either in awk or in URP inside Solr.
 
 But it would definitely work.
 
 Regards,
   Alex.
 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/
 
 
 On 28 August 2015 at 12:39, Erik Hatcher erik.hatc...@gmail.com wrote:
 How about this incantation:
 
 $ bin/solr create -c fw
 $ echo Q36 | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
 bin/post -c fw -params fieldnames=id,valheader=false -type text/csv -d
 $ curl 'http://localhost:8983/solr/fw/select?q=*:*wt=csv'
 val,_version_,id
 36,1510767115252006912,Q
 
 With a big bunch of data, the stdin detection of bin/post doesn’t work well 
 so I’d certainly recommend going to an intermediate real file (awk...  
 data.csv ; bin/post … data.csv) instead.
 
 
 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com
 
 
 
 
 On Aug 28, 2015, at 3:19 AM, timmsn tim.hammac...@web.de wrote:
 
 Hello,
 
 i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
 they have a fixed length and no withespace to seperate the words.
 How can i Programm a Template or so for my fields?
 Or can i edit the schema.xml for my Problem?
 
 This ist one record from one file, in this file are 40 - 100 records.
 
 AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
 MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
 130445
 
 
 Thanks!
 
 Tim
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Dynamic field rule plugin?

2015-08-28 Thread Hari Iyer
Hi,

I am new to Solr and am trying to create dynamic field rules in my Schema. 

I would like to use file name suffix to indicate other properties besides
the data type and multivalued as provided in the default schema. 

It appears that specifying this via a pattern leads to duplication as there
are various combinations that need to be specified here. It would help to
have code where I can build parts of the rule 

e.g. if suffix has '_s' then set stored=true

if suffix has '_m' then set multivalued=true

and so on

 

From the documentation and various implementation examples (drupal etc) I
can only see them specifying all combinations.

Is there any way (plugin?) to incrementally build the rule?

Thanks,

Hari

 



RE: Data Import Handler Stays Idle

2015-08-28 Thread Allison, Timothy B.
Only a month late to respond, and the response likely won't help.

I agree with Shawn that Tika can be a memory hog.  I try to leave 1GB per 
thread, but your mileage will vary dramatically depending on your docs.  I'd 
expect that you'd get an OOM, though, somewhere...

There have been rare bugs in various parsers, including the PDFParser, in 
various versions of Tika that cause permanent hangs.  I haven't experimented 
with DIH and known trigger files, but I suspect you'd get the behavior that 
you're seeing if this were to happen.

So, short of rolling your own ETL'r in lieu of DIH or hardening DIH to run tika 
in a different process (tika-server, perhaps -- 
https://issues.apache.org/jira/browse/SOLR-7632) or going big with Hadoop, 
morphlines, etc, your only hope is to upgrade Tika and hope that that was one 
of the bugs that we've already identified and fixed.

If you do go with morphlines...I don't think this has been fixed yet: 
https://github.com/kite-sdk/kite/issues/397

Did you ever figure out what was going wrong?

Best,

 Tim

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, July 21, 2015 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Data Import Handler Stays Idle

On 7/21/2015 8:17 AM, Paden wrote:
 There are some zip files inside the directory and have been addressed 
 to in the database. I'm thinking those are the one's it's jumping 
 right over. They are not the issue. At least I'm 95% sure. And Shawn 
 if you're still watching I'm sorry I'm using solr-5.1.0.

Have you started Solr with a larger heap than the default 512MB in Solr 5.x?  
Tika can require a lot of memory.  I would have expected there to be 
OutOfMemoryError exceptions in the log if that were the problem, though.

You may need to use the -m option on the startup scripts to increase the max 
heap.  Starting with -m 2g would be a good idea.

Also, seeing the entire multi-line IOException from the log (which may be 
dozens of lines) could be important.

Thanks,
Shawn



RE: Data Import Handler Stays Idle

2015-08-28 Thread Allison, Timothy B.
 There are some zip files inside the directory and have been addressed 
 to in the database. I'm thinking those are the one's it's jumping 
 right over.

With SOLR-7189, which should have kicked in for 5.1, Tika shouldn't skip over 
Zip files, it should process all the contents of those zips and concatenate the 
extracted text into one string.


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Tuesday, July 21, 2015 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Data Import Handler Stays Idle

On 7/21/2015 8:17 AM, Paden wrote:
 There are some zip files inside the directory and have been addressed 
 to in the database. I'm thinking those are the one's it's jumping 
 right over. They are not the issue. At least I'm 95% sure. And Shawn 
 if you're still watching I'm sorry I'm using solr-5.1.0.

Have you started Solr with a larger heap than the default 512MB in Solr 5.x?  
Tika can require a lot of memory.  I would have expected there to be 
OutOfMemoryError exceptions in the log if that were the problem, though.

You may need to use the -m option on the startup scripts to increase the max 
heap.  Starting with -m 2g would be a good idea.

Also, seeing the entire multi-line IOException from the log (which may be 
dozens of lines) could be important.

Thanks,
Shawn



PingRequestHandler and file corruption

2015-08-28 Thread Davis, Daniel (NIH/NLM) [C]
This is a resend to correct my awful subject.

From: Davis, Daniel (NIH/NLM) [C]
Sent: Friday, August 28, 2015 2:15 PM
To: solr-user@lucene.apache.org
Subject: ping handler very doubtful

So, I tested that the PingRequestHandler works in the following fashion:

cd server/corename/data/index
# some work with ls and awk to produce a script, and then it runs
dd if=/dev/urandom of=`pwd`/segments_10 bs=160 count=1
dd if=/dev/urandom of=`pwd`/_u.fdt bs=41512 count=1
dd if=/dev/urandom of=`pwd`/_u.fdx bs=100 count=1
dd if=/dev/urandom of=`pwd`/_u.fnm bs=1573 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.doc bs=16824 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.pos bs=17677 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tim bs=54010 count=1
dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tip bs=1466 count=1
dd if=/dev/urandom of=`pwd`/_u.nvd bs=966 count=1
dd if=/dev/urandom of=`pwd`/_u.nvm bs=140 count=1
dd if=/dev/urandom of=`pwd`/_u.si bs=409 count=1

This did not cause any immediate problems with the PingRequestHandler because 
the query was cached.
Worth a bug?

It did of course cause problems for the health monitor following RELOAD or 
complete solr restart; which was enough for me.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH