[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273176#comment-14273176 ] Alexandre Rafalovitch commented on SOLR-6959: - This output is in my book's current draft. You bet I don't want to explain why two different invocations do different things. Unless they actually do different things. :-) > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 4.10.3 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > Fix For: 5.0, Trunk > > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273174#comment-14273174 ] Erik Hatcher commented on SOLR-6959: bq. Except this now uncovers a little wrinkle... ok, ok! :) dang you're thorough, and thanks for that seriously. aligned to "application/xml". no (good) reason they were different. > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 4.10.3 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > Fix For: 5.0, Trunk > > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273171#comment-14273171 ] ASF subversion and git services commented on SOLR-6959: --- Commit 1651028 from [~ehatcher] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1651028 ] SOLR-6959: standardize XML content-type (merged from trunk r1651027) > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 4.10.3 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > Fix For: 5.0, Trunk > > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273170#comment-14273170 ] ASF subversion and git services commented on SOLR-6959: --- Commit 1651027 from [~ehatcher] in branch 'dev/trunk' [ https://svn.apache.org/r1651027 ] SOLR-6959: standardize XML content-type > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 4.10.3 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > Fix For: 5.0, Trunk > > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273133#comment-14273133 ] Alexandre Rafalovitch commented on SOLR-6959: - Looks good. Except this now uncovers a little wrinkle: {quote} $ java -Dc=techproducts -jar post.jar hd.xml SimplePostTool version 1.5 Posting files to \[base] url http://localhost:8983/solr/techproducts/update using content-type application/xml... POSTing file hd.xml to \[base] {quote} vs. {quote} $ java -Dc=techproducts -Dauto -jar post.jar hd.xml SimplePostTool version 1.5 Posting files to \[base] url http://localhost:8983/solr/techproducts/update... Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file hd.xml (text/xml) to \[base] {quote} Is there a reason we are using different content types for the same XML file with and without *-Dauto*? > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 4.10.3 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > Fix For: 5.0, Trunk > > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273121#comment-14273121 ] Alexandre Rafalovitch commented on SOLR-6959: - Actually, these days, these two handlers are commented out in the source code and are instead hard-coded as an implicit handler. Causing confusion of their own (SOLR-6938). FWIW. > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 4.10.3 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > Fix For: 5.0, Trunk > > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273119#comment-14273119 ] Erik Hatcher commented on SOLR-6959: bq. This could also clarify a bit the situation with the fact that XML, CSV, and JSON go to the same handler, yet we have - slightly confusingly - request handlers for both CSV and JSON in the solrconfig.xml Well, if someone is using post.jar, chances are he/she isn't aware of the additional handlers that you mention so there wouldn't be any confusion I don't think. Those handlers are just there for backwards compatibility (or for aesthetics, if one likes to post to, say, /update/csv). I don't think we need to do anything different here. > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273114#comment-14273114 ] ASF subversion and git services commented on SOLR-6959: --- Commit 1651016 from [~ehatcher] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1651016 ] SOLR-6959: Elaborate on URLs being POSTed to (merged from trunk r1651013) > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273113#comment-14273113 ] ASF subversion and git services commented on SOLR-6959: --- Commit 1651015 from [~ehatcher] in branch 'dev/trunk' [ https://svn.apache.org/r1651015 ] SOLR-6959: Elaborate on URLs being POSTed to > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273012#comment-14273012 ] Alexandre Rafalovitch commented on SOLR-6959: - This is a very interesting and educational question. The fact that the */update* is a *base* is not well explained anywhere. I just run the test {quote} java -Durl=http://localhost:8983/solr/techproducts/update2 -Dauto -jar post.jar * {quote} And it did do *POST /solr/techproducts/update2/extract* for the PDF file. Not what I expected somehow. My main concern is reducing the magic through a better message. If somebody posted a file and something unexpected happened, they would troubleshoot it by following the _request handler_ and it's parameters as one of the steps. But we don't tell them here which request handler it is. We give only one piece of information here that just happen to also be a valid _request handler_. They could pick that information up from the log file I guess if they had access to it and knew what to look for. But it would be easier if the tool was more clear about it, as it does not know exactly what happened. What if we add something like this to the message: {quote} POSTing file books.csv (text/csv) to \[base] POSTing file solr-word.pdf (application/pdf) to \[base]/extract {quote} Where the word \[base] is just that - the word. This could also clarify a bit the situation with the fact that XML, CSV, and JSON go to the same handler, yet we have - slightly confusingly - request handlers for both CSV and JSON in the solrconfig.xml. The help message for the tool needs to be improved as well. It says *solr-update-url* and nothing about base and suffixes. > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273002#comment-14273002 ] Erik Hatcher commented on SOLR-6959: The url to post files to is determined on a per-file basis, which could be a directory of files where .xml files go to /update and .pdf files go to /update/extract. The logging message does qualify that it is the "base" URL. Would you want the URL logged for *every* file? > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs
[ https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272996#comment-14272996 ] Alexandre Rafalovitch commented on SOLR-6959: - Also, at least the parameters passed with -Dparams are shown in that log message. The PDF code adds some parameters internally (like literal.id). Should they be shown as well? They are very long though (full file path). > SimplePostTool reports incorrect base url for PDFs > -- > > Key: SOLR-6959 > URL: https://issues.apache.org/jira/browse/SOLR-6959 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Alexandre Rafalovitch >Assignee: Erik Hatcher >Priority: Minor > Labels: tools > > {quote} > $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/techproducts/update.. > {quote} > This command will *not* post to */update*, it will post to */update/extract*. > This should be reported correspondingly. > From the server log: > {quote} > 127.0.0.1 - - \[11/Jan/2015:17:17:10 +] "POST > /solr/techproducts/update/extract?resource.name= > {quote} > It would make sense for that message to be after the auto-mode determination > just before the actual POST. > Also, what's with two dots after the url? If it is _etc_, it should probably > be three dots. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org