[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539412#comment-15539412 ] Alexandre Rafalovitch commented on SOLR-2731: - For exporting significant amount of data, we now have [/export|https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets] handler. Specifically, for importing into another instance, there is [DIH with SolrInputProcessor|https://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor]. Would either of those have fulfilled the need? The output of this writer - as proposed - would not even be able to go back into the Solr, that would require updating a different component and additional, completely new discussion. > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0-ALPHA >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 4.9, 6.0 > > Attachments: SOLR-2731-R1.patch, SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539209#comment-15539209 ] jmlucjav commented on SOLR-2731: this would be still a nice addition for a very specific case (that I faced recently): you want to export a lot of data so you combine wt=csv with cursorMark feature, so you can reindex the output in another solr instance. I managed to do without, but this would have been a cleaner way. > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0-ALPHA >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 4.9, 6.0 > > Attachments: SOLR-2731-R1.patch, SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15538833#comment-15538833 ] Alexandre Rafalovitch commented on SOLR-2731: - Is there still a desire in augmenting CSV output or have JSON/export handlers proved sufficient? > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0-ALPHA >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 4.9, 6.0 > > Attachments: SOLR-2731-R1.patch, SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091735#comment-13091735 ] Erik Hatcher commented on SOLR-2731: Perhaps we could have an Excel response writer that could create a multi-sheet spreadsheet file? > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091734#comment-13091734 ] Erik Hatcher commented on SOLR-2731: I'm mostly with Lance here, actually. I want *pure* CSV. So long as there there is always an option (which should be the default) to keep the output pure CSV then I'm ok with whatever extras folks want to add as options. We really should get the response writer framework able to return custom HTTP headers though. > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091697#comment-13091697 ] Lance Norskog commented on SOLR-2731: - -1 * When you do the same query twice, the second time it usually takes 0ms. If it doesn't, turn on query caching. * You can code these variations with Velocity. I would stick with keeping the very simplest CSV output and then coding any additions yourself. > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091483#comment-13091483 ] Hoss Man commented on SOLR-2731: i think yonik's 1st example would be the best for people loading the data into a spreedsheet tool or parsing with conventional CSV tools. (even better then #2 because it's easy to cut/paste that data into a different sheet and still have clean separation between headers/data. or parsing with conventional CSV tools) but i would suggest that if we're at the point of thinking about having a "metadata" section and a "results" section we shouldn't limit ourselves to two sections. instead of just including metadata about the main doclist, we could allow arbitrary sections or arbitrary lengths (like facet counts) ... i haven't thought hard about what the params should look like, but i would suggest that for easy output parsing a simple 1 row/column row count prefix value telling you the number of (csv) rows for each "section", followed by the (csv) rows of data (including a header row for each section if "csv.header=true") would be easy for people to parse (assuming they were expecting it because they asked for it) ie... {noformat} 2 numFound,maxScore,start 103,1.414,100 4 id,score doc1,1.3 doc2,1.1 doc3,1.05 {noformat} ..or if csv.header=false ... {noformat} 1 103,1.414,100 3 doc1,1.3 doc2,1.1 doc3,1.05 {noformat} We can worry about what other "sections" might be supported later as long as the basic param syntax gets fleshed out ... i would suggest maybe something like: * multivalued "csv.section" param * sections are written out in the order that they are passed as param * default is "csv.section=results" * if only one value is specified for csv.section, then no row count prefix is used for that section * only one other value for csv.section supported initially: "csv.section=results.meta" ** adds the numFound,maxScore,start for the results > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091227#comment-13091227 ] Simon Rosenthal commented on SOLR-2731: --- good point. In that case I'm agnostic - 1) would be fine. > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091205#comment-13091205 ] Jon Hoffman commented on SOLR-2731: --- Simon, Keep in mind that this additional header would only appear if you asked for it via a request parameter like "csv.metaheader=true". Existing behavior would remain unchanged. Is that still a problem? > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091187#comment-13091187 ] Simon Rosenthal commented on SOLR-2731: --- In addition to loading CSV results into a spreadsheet, I often use CSV as a quick-and-dirty way of dumping the contents of an index to be re-read into Solr, and adding lines which would need manual removal would be rather inconvenient. I'd go for option 4, with the comment symbol and result metadata on one line. org.apache.commons.csv has an option (which is not currently enabled in the CSVRequestHandler) to recognize and discard comment lines - adding a request parameter to the handler to recognize comment lines would be straightforward, and would at least solve my use case, though I admit not all others. > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091119#comment-13091119 ] Jon Hoffman commented on SOLR-2731: --- To be clear, I like the first option best. > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1309#comment-1309 ] Jon Hoffman commented on SOLR-2731: --- I like maintaining consistency with the CSV format because you don't have to reinvent any parsing logic. It should be pretty easy for the client developer to read off the first two lines and parse with the same tool that's used to parse the rest of the document. Preferences around separator, newline, etc can be reused (except maybe this meta header should always have a column name header). What should the parameter be called? csv.metaheader? > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091094#comment-13091094 ] Yonik Seeley commented on SOLR-2731: It seems like if we go down this road, it should somehow be a more generic mechanism (since others will then want values like maxScore, etc). Here are some alternatives: {code} numFound,maxScore,start 2038,1.414,100 id,score doc1,1.3 doc2,1.1 doc3,1.05 numFound,2038,maxScore,1.414,start,100 id,score doc1,1.3 doc2,1.1 doc3,1.05 numFound=2038,maxScore=1.414,start=100 id,score doc1,1.3 doc2,1.1 doc3,1.05 #numFound=2038,maxScore=1.414,start=100 id,score doc1,1.3 doc2,1.1 doc3,1.05 {code} Perhaps the "numFound=2038,maxScore=1.414,start=100" would be the most human readable (and maybe alternately commenting it if that's supported). But the first option could be attractive since it's more in the spirit of CSV and might be desirable if imported into excel for example. Thoughts? > CSVResponseWriter should optionally return numfound > --- > > Key: SOLR-2731 > URL: https://issues.apache.org/jira/browse/SOLR-2731 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 3.1, 3.3, 4.0 >Reporter: Jon Hoffman > Labels: patch > Fix For: 3.1.1, 3.3, 4.0 > > Attachments: SOLR-2731.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > an optional parameter "csv.numfound=true" can be added to the request which > causes the first line of the response to be the numfound. This would have no > impact on existing behavior, and those who are interested in that value can > simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org