Re: Problem with CSV update handler
Hello! Once again thanks for the response ;) So the solution is to generate the data files once again and either adding the space after doubled encapsulator or changing the encapsulator to the character that does not occur in the filed values (of course the one taht will be split). -- Regards, Rafał Kuć http://solr.pl Multi-valued CSV fields are double encoded. We start with: aaa bbbccc' Then decoding one leve, we get: aaa bbbccc Decoding again to get individual values results in a decode error because the encapsulator appears unescaped in the middle of the second value (i.e. invalid CSV). One easier way to fix this is to use a different encapsulator for the sub-values of a multi-valued field by adding f.title.encapsulator=%27 (a single quote char) But I can't really tell you exactly how to encode or specify options to the CSV loader when I don't know what the actual values you want after aaa bbbccc' is decoded. -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć r@solr.pl wrote: Hi! Yonik, thanks for the reply. I just realized that the example I gave was not full - the error is returned by Solr only when the field is multivalued and the values in the fields are splited. For example, the following curl command give me the mentioned error: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbbccc' while the following is executed without any problem: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbb ccc' The only difference between those two is the additional space character in between bbb and ccc in the second example. Am I doing something wrong ? ;) -- Regards, Rafał Kuć http://solr.pl This works fine for me: curl http://localhost:8983/solr/update/csv -H 'Content-type:text/plain' -d 'id,name 1,aaa bbb ccc' -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote: Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć
Re: Problem with CSV update handler
On Tue, Jun 21, 2011 at 2:15 AM, Rafał Kuć r@solr.pl wrote: Hello! Once again thanks for the response ;) So the solution is to generate the data files once again and either adding the space after doubled encapsulator Maybe... I can't tell if the file is encoded correctly or not since I don't know what the decoded values are supposed to be from your example. -Yonik http://www.lucidimagination.com or changing the encapsulator to the character that does not occur in the filed values (of course the one taht will be split). -- Regards, Rafał Kuć http://solr.pl Multi-valued CSV fields are double encoded. We start with: aaa bbbccc' Then decoding one leve, we get: aaa bbbccc Decoding again to get individual values results in a decode error because the encapsulator appears unescaped in the middle of the second value (i.e. invalid CSV). One easier way to fix this is to use a different encapsulator for the sub-values of a multi-valued field by adding f.title.encapsulator=%27 (a single quote char) But I can't really tell you exactly how to encode or specify options to the CSV loader when I don't know what the actual values you want after aaa bbbccc' is decoded. -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć r@solr.pl wrote: Hi! Yonik, thanks for the reply. I just realized that the example I gave was not full - the error is returned by Solr only when the field is multivalued and the values in the fields are splited. For example, the following curl command give me the mentioned error: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbbccc' while the following is executed without any problem: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbb ccc' The only difference between those two is the additional space character in between bbb and ccc in the second example. Am I doing something wrong ? ;) -- Regards, Rafał Kuć http://solr.pl This works fine for me: curl http://localhost:8983/solr/update/csv -H 'Content-type:text/plain' -d 'id,name 1,aaa bbb ccc' -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote: Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć
Problem with CSV update handler
Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć
Re: Problem with CSV update handler
This works fine for me: curl http://localhost:8983/solr/update/csv -H 'Content-type:text/plain' -d 'id,name 1,aaa bbb ccc' -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote: Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć
Re: Problem with CSV update handler
Hi! Yonik, thanks for the reply. I just realized that the example I gave was not full - the error is returned by Solr only when the field is multivalued and the values in the fields are splited. For example, the following curl command give me the mentioned error: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbbccc' while the following is executed without any problem: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbb ccc' The only difference between those two is the additional space character in between bbb and ccc in the second example. Am I doing something wrong ? ;) -- Regards, Rafał Kuć http://solr.pl This works fine for me: curl http://localhost:8983/solr/update/csv -H 'Content-type:text/plain' -d 'id,name 1,aaa bbb ccc' -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote: Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć
Re: Problem with CSV update handler
Multi-valued CSV fields are double encoded. We start with: aaa bbbccc' Then decoding one leve, we get: aaa bbbccc Decoding again to get individual values results in a decode error because the encapsulator appears unescaped in the middle of the second value (i.e. invalid CSV). One easier way to fix this is to use a different encapsulator for the sub-values of a multi-valued field by adding f.title.encapsulator=%27 (a single quote char) But I can't really tell you exactly how to encode or specify options to the CSV loader when I don't know what the actual values you want after aaa bbbccc' is decoded. -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć r@solr.pl wrote: Hi! Yonik, thanks for the reply. I just realized that the example I gave was not full - the error is returned by Solr only when the field is multivalued and the values in the fields are splited. For example, the following curl command give me the mentioned error: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbbccc' while the following is executed without any problem: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbb ccc' The only difference between those two is the additional space character in between bbb and ccc in the second example. Am I doing something wrong ? ;) -- Regards, Rafał Kuć http://solr.pl This works fine for me: curl http://localhost:8983/solr/update/csv -H 'Content-type:text/plain' -d 'id,name 1,aaa bbb ccc' -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote: Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć