Re: Indexing Fixed length file

2015-09-04 Thread timmsn
Hi Guys,


thanks for the Answers you help me alot. I wrote a php scipt for this
Problem.


Thank you




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807p4227163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Fixed length file

2015-08-28 Thread Erik Hatcher
Ah yes, I should have made my example use tabs, though that currently would 
have required also adding “&separator=%09” to the params.

I definitely support the use of tabs for what they were intended, delimiting 
columns of data.  +1, thanks for that mention Alex






> On Aug 28, 2015, at 1:38 PM, Alexandre Rafalovitch  wrote:
> 
> Erik's version might be better with tabs though to avoid CSV's
> requirements on escaping comas, quotes, etc. And maybe trim those
> fields a bit either in awk or in URP inside Solr.
> 
> But it would definitely work.
> 
> Regards,
>   Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
> 
> 
> On 28 August 2015 at 12:39, Erik Hatcher  wrote:
>> How about this incantation:
>> 
>> $ bin/solr create -c fw
>> $ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
>> bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d
>> $ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv'
>> val,_version_,id
>> 36,1510767115252006912,Q
>> 
>> With a big bunch of data, the stdin detection of bin/post doesn’t work well 
>> so I’d certainly recommend going to an intermediate real file (awk... > 
>> data.csv ; bin/post … data.csv) instead.
>> 
>> 
>> —
>> Erik Hatcher, Senior Solutions Architect
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>>> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
>>> 
>>> Hello,
>>> 
>>> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
>>> they have a fixed length and no withespace to seperate the words.
>>> How can i Programm a Template or so for my fields?
>>> Or can i edit the schema.xml for my Problem?
>>> 
>>> This ist one record from one file, in this file are 40 - 100 records.
>>> 
>>> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
>>> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
>>> 130445
>>> 
>>> 
>>> Thanks!
>>> 
>>> Tim
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: Indexing Fixed length file

2015-08-28 Thread Alexandre Rafalovitch
Erik's version might be better with tabs though to avoid CSV's
requirements on escaping comas, quotes, etc. And maybe trim those
fields a bit either in awk or in URP inside Solr.

But it would definitely work.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 28 August 2015 at 12:39, Erik Hatcher  wrote:
> How about this incantation:
>
> $ bin/solr create -c fw
> $ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
> bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d
> $ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv'
> val,_version_,id
> 36,1510767115252006912,Q
>
> With a big bunch of data, the stdin detection of bin/post doesn’t work well 
> so I’d certainly recommend going to an intermediate real file (awk... > 
> data.csv ; bin/post … data.csv) instead.
>
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com
>
>
>
>
>> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
>>
>> Hello,
>>
>> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
>> they have a fixed length and no withespace to seperate the words.
>> How can i Programm a Template or so for my fields?
>> Or can i edit the schema.xml for my Problem?
>>
>> This ist one record from one file, in this file are 40 - 100 records.
>>
>> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
>> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
>> 130445
>>
>>
>> Thanks!
>>
>> Tim
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Indexing Fixed length file

2015-08-28 Thread Erik Hatcher
How about this incantation:

$ bin/solr create -c fw
$ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | 
bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d
$ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv'
val,_version_,id
36,1510767115252006912,Q

With a big bunch of data, the stdin detection of bin/post doesn’t work well so 
I’d certainly recommend going to an intermediate real file (awk... > data.csv ; 
bin/post … data.csv) instead.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com




> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
> 
> Hello,
> 
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words. 
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
> 
> This ist one record from one file, in this file are 40 - 100 records.
> 
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
> 130445 
> 
> 
> Thanks! 
> 
> Tim
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing Fixed length file

2015-08-28 Thread Alexandre Rafalovitch
If you use DataImportHandler, you can combine LineEntityProcessor with
RegexTransformer to split each line into a bunch of fields:
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheRegexTransformer

You could then trim the whitespace in the UpdateRequestProcessor chain
that you can setup to run after DIH and use TrimFieldUpdate URP
http://www.solr-start.com/info/update-request-processors/#TrimFieldUpdateProcessorFactory

I think this should do the job. With bin/post, you could setup a
custom URP chain as well, but it does not have an equivalent of
RegexTransformer that splits into multiple other fields. Not that it
would be hard to write one, just nobody did yet.

Regards,
   Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 28 August 2015 at 03:19, timmsn  wrote:
> Hello,
>
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words.
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
>
> This ist one record from one file, in this file are 40 - 100 records.
>
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
> 130445
>
>
> Thanks!
>
> Tim
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Fixed length file

2015-08-28 Thread Steve Rowe
Hi Tim,

I haven’t heard of people indexing this kind of input with Solr, but the format 
is quite similar to CSV/TSV files, with the exception that the field separators 
have fixed positions and are omitted.

You could write a short script to insert separators (e.g. commas) at these 
points (but be sure to escape quotation marks and the separators) and then use 
Solr’s CSV update functionality: 
<https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates>.

I think dealing with fixed-width fields directly would be a nice addition to 
Solr’s CSV update capabilities - feel free to make an issue - see 
<http://wiki.apache.org/solr/HowToContribute>.

Steve
www.lucidworks.com

> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
> 
> Hello,
> 
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words. 
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
> 
> This ist one record from one file, in this file are 40 - 100 records.
> 
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
> 130445 
> 
> 
> Thanks! 
> 
> Tim
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing Fixed length file

2015-08-28 Thread Erick Erickson
Solr doesn't know anything about such a file. The post program expects
well-defined structures, see the xml and json formats in example/exampledocs.

So you either have to transform the data into the form expected by the bin/post
tool or perhaps you can use the CSV import, see:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates

Best,
Erick

On Fri, Aug 28, 2015 at 12:19 AM, timmsn  wrote:
> Hello,
>
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words.
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
>
> This ist one record from one file, in this file are 40 - 100 records.
>
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062
> 130445
>
>
> Thanks!
>
> Tim
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Indexing Fixed length file

2015-08-28 Thread timmsn
Hello,

i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
they have a fixed length and no withespace to seperate the words. 
How can i Programm a Template or so for my fields?
Or can i edit the schema.xml for my Problem?

This ist one record from one file, in this file are 40 - 100 records.

AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
130445 


Thanks! 

Tim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
Sent from the Solr - User mailing list archive at Nabble.com.