Re: Indexing Fixed length file
Hi Guys, thanks for the Answers you help me alot. I wrote a php scipt for this Problem. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807p4227163.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Fixed length file
Ah yes, I should have made my example use tabs, though that currently would have required also adding “&separator=%09” to the params. I definitely support the use of tabs for what they were intended, delimiting columns of data. +1, thanks for that mention Alex > On Aug 28, 2015, at 1:38 PM, Alexandre Rafalovitch wrote: > > Erik's version might be better with tabs though to avoid CSV's > requirements on escaping comas, quotes, etc. And maybe trim those > fields a bit either in awk or in URP inside Solr. > > But it would definitely work. > > Regards, > Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 28 August 2015 at 12:39, Erik Hatcher wrote: >> How about this incantation: >> >> $ bin/solr create -c fw >> $ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | >> bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d >> $ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv' >> val,_version_,id >> 36,1510767115252006912,Q >> >> With a big bunch of data, the stdin detection of bin/post doesn’t work well >> so I’d certainly recommend going to an intermediate real file (awk... > >> data.csv ; bin/post … data.csv) instead. >> >> >> — >> Erik Hatcher, Senior Solutions Architect >> http://www.lucidworks.com >> >> >> >> >>> On Aug 28, 2015, at 3:19 AM, timmsn wrote: >>> >>> Hello, >>> >>> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files >>> they have a fixed length and no withespace to seperate the words. >>> How can i Programm a Template or so for my fields? >>> Or can i edit the schema.xml for my Problem? >>> >>> This ist one record from one file, in this file are 40 - 100 records. >>> >>> AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, >>> MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 >>> 130445 >>> >>> >>> Thanks! >>> >>> Tim >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>
Re: Indexing Fixed length file
Erik's version might be better with tabs though to avoid CSV's requirements on escaping comas, quotes, etc. And maybe trim those fields a bit either in awk or in URP inside Solr. But it would definitely work. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 28 August 2015 at 12:39, Erik Hatcher wrote: > How about this incantation: > > $ bin/solr create -c fw > $ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | > bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d > $ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv' > val,_version_,id > 36,1510767115252006912,Q > > With a big bunch of data, the stdin detection of bin/post doesn’t work well > so I’d certainly recommend going to an intermediate real file (awk... > > data.csv ; bin/post … data.csv) instead. > > > — > Erik Hatcher, Senior Solutions Architect > http://www.lucidworks.com > > > > >> On Aug 28, 2015, at 3:19 AM, timmsn wrote: >> >> Hello, >> >> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files >> they have a fixed length and no withespace to seperate the words. >> How can i Programm a Template or so for my fields? >> Or can i edit the schema.xml for my Problem? >> >> This ist one record from one file, in this file are 40 - 100 records. >> >> AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, >> MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 >> 130445 >> >> >> Thanks! >> >> Tim >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html >> Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Indexing Fixed length file
How about this incantation: $ bin/solr create -c fw $ echo "Q36" | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | bin/post -c fw -params "fieldnames=id,val&header=false" -type text/csv -d $ curl 'http://localhost:8983/solr/fw/select?q=*:*&wt=csv' val,_version_,id 36,1510767115252006912,Q With a big bunch of data, the stdin detection of bin/post doesn’t work well so I’d certainly recommend going to an intermediate real file (awk... > data.csv ; bin/post … data.csv) instead. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com > On Aug 28, 2015, at 3:19 AM, timmsn wrote: > > Hello, > > i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files > they have a fixed length and no withespace to seperate the words. > How can i Programm a Template or so for my fields? > Or can i edit the schema.xml for my Problem? > > This ist one record from one file, in this file are 40 - 100 records. > > AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, > MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 > 130445 > > > Thanks! > > Tim > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Fixed length file
If you use DataImportHandler, you can combine LineEntityProcessor with RegexTransformer to split each line into a bunch of fields: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheRegexTransformer You could then trim the whitespace in the UpdateRequestProcessor chain that you can setup to run after DIH and use TrimFieldUpdate URP http://www.solr-start.com/info/update-request-processors/#TrimFieldUpdateProcessorFactory I think this should do the job. With bin/post, you could setup a custom URP chain as well, but it does not have an equivalent of RegexTransformer that splits into multiple other fields. Not that it would be hard to write one, just nobody did yet. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 28 August 2015 at 03:19, timmsn wrote: > Hello, > > i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files > they have a fixed length and no withespace to seperate the words. > How can i Programm a Template or so for my fields? > Or can i edit the schema.xml for my Problem? > > This ist one record from one file, in this file are 40 - 100 records. > > AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, > MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 > 130445 > > > Thanks! > > Tim > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Fixed length file
Hi Tim, I haven’t heard of people indexing this kind of input with Solr, but the format is quite similar to CSV/TSV files, with the exception that the field separators have fixed positions and are omitted. You could write a short script to insert separators (e.g. commas) at these points (but be sure to escape quotation marks and the separators) and then use Solr’s CSV update functionality: <https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates>. I think dealing with fixed-width fields directly would be a nice addition to Solr’s CSV update capabilities - feel free to make an issue - see <http://wiki.apache.org/solr/HowToContribute>. Steve www.lucidworks.com > On Aug 28, 2015, at 3:19 AM, timmsn wrote: > > Hello, > > i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files > they have a fixed length and no withespace to seperate the words. > How can i Programm a Template or so for my fields? > Or can i edit the schema.xml for my Problem? > > This ist one record from one file, in this file are 40 - 100 records. > > AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, > MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 > 130445 > > > Thanks! > > Tim > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Fixed length file
Solr doesn't know anything about such a file. The post program expects well-defined structures, see the xml and json formats in example/exampledocs. So you either have to transform the data into the form expected by the bin/post tool or perhaps you can use the CSV import, see: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates Best, Erick On Fri, Aug 28, 2015 at 12:19 AM, timmsn wrote: > Hello, > > i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files > they have a fixed length and no withespace to seperate the words. > How can i Programm a Template or so for my fields? > Or can i edit the schema.xml for my Problem? > > This ist one record from one file, in this file are 40 - 100 records. > > AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, > MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 > 130445 > > > Thanks! > > Tim > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html > Sent from the Solr - User mailing list archive at Nabble.com.
Indexing Fixed length file
Hello, i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files they have a fixed length and no withespace to seperate the words. How can i Programm a Template or so for my fields? Or can i edit the schema.xml for my Problem? This ist one record from one file, in this file are 40 - 100 records. AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 130445 Thanks! Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html Sent from the Solr - User mailing list archive at Nabble.com.