[jira] [Commented] (CONNECTORS-254) Bad request when posting 0 byte file to Solr

2011-09-13 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103678#comment-13103678
 ] 

Karl Wright commented on CONNECTORS-254:


The --data-binary switch to curl causes it to use POST.  Here's a bit of the 
manpage:

 --data-binary 
  (HTTP) This posts data exactly as specified with no  extra  proâ
  cessing whatsoever.

So, what happens when you use curl to post your zero-length file?  Do you get 
back a 400 response?  Try using the CURL -vvv switch to see what it is doing.

If it comes back with a 200 OK response and not a 400, then either you or I 
should try to do the same thing while Wireshark is capturing packets.  If you 
are using a Linux system, you could instead want to use tcpdump to do the 
capture, and then examine the capture with Wireshark (on Windows).  If this is 
too confusing, now that I have a test case I can try to do this later today.


> Bad request when posting 0 byte file to Solr
> 
>
> Key: CONNECTORS-254
> URL: https://issues.apache.org/jira/browse/CONNECTORS-254
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Shinichiro Abe
>Priority: Minor
> Fix For: ManifoldCF 0.3
>
> Attachments: CONNECTORS-254-1.patch, sample0byte.zip
>
>
> It seems that httpposter brings about bad request when posting 0 byte file.
> Solr log say the below. "missing content stream". Status code is 400. 
> On the other hand when using Solr request handler without MCF, this exception 
> is not thrown and the posting 0 byte files is indexed normally.
>  
> 2011/09/13 12:30:40 org.apache.solr.core.SolrCore execute
> ???: [] webapp=/solr path=/update/extract 
> params={literal.id=file:/Users/abe/Desktop/1/no-content/no-content.txt&literal.uri=/Users/abe/Desktop/1/no-content/no-content.txt}
>  status=400 QTime=367 
> 2011/09/13 12:30:40 org.apache.solr.common.SolrException log
> ?v???I: org.apache.solr.common.SolrException: missing content stream
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-254) Bad request when posting 0 byte file to Solr

2011-09-13 Thread Shinichiro Abe (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103670#comment-13103670
 ] 

Shinichiro Abe commented on CONNECTORS-254:
---

I'm sorry. 
I modified post.sh and posted 0 byte file. But the exception isn't thrown in 
this case. Is this a GET?
 
---
FILES=$*
URL=http://localhost:8983/solr/update/extract?literal.id=1

for f in $FILES; do
  curl $URL --data-binary @$f -H 'Content-type:application/octet-stream' 
done
---

> Bad request when posting 0 byte file to Solr
> 
>
> Key: CONNECTORS-254
> URL: https://issues.apache.org/jira/browse/CONNECTORS-254
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Shinichiro Abe
>Priority: Minor
> Fix For: ManifoldCF 0.3
>
> Attachments: CONNECTORS-254-1.patch, sample0byte.zip
>
>
> It seems that httpposter brings about bad request when posting 0 byte file.
> Solr log say the below. "missing content stream". Status code is 400. 
> On the other hand when using Solr request handler without MCF, this exception 
> is not thrown and the posting 0 byte files is indexed normally.
>  
> 2011/09/13 12:30:40 org.apache.solr.core.SolrCore execute
> ???: [] webapp=/solr path=/update/extract 
> params={literal.id=file:/Users/abe/Desktop/1/no-content/no-content.txt&literal.uri=/Users/abe/Desktop/1/no-content/no-content.txt}
>  status=400 QTime=367 
> 2011/09/13 12:30:40 org.apache.solr.common.SolrException log
> ?v???I: org.apache.solr.common.SolrException: missing content stream
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-254) Bad request when posting 0 byte file to Solr

2011-09-13 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103639#comment-13103639
 ] 

Karl Wright commented on CONNECTORS-254:


The curl command you have described is not even a POST, it's a GET.  The 
document itself is not being sent, just the file name.  The stream.file 
argument may well be used by Solr on the update handler side to open the file 
directly.  That's not going to work with ManifoldCF though because there's no 
guarantee of a shared file system between ManifoldCF and the Solr instance.



> Bad request when posting 0 byte file to Solr
> 
>
> Key: CONNECTORS-254
> URL: https://issues.apache.org/jira/browse/CONNECTORS-254
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Shinichiro Abe
>Priority: Minor
> Fix For: ManifoldCF 0.3
>
> Attachments: CONNECTORS-254-1.patch, sample0byte.zip
>
>
> It seems that httpposter brings about bad request when posting 0 byte file.
> Solr log say the below. "missing content stream". Status code is 400. 
> On the other hand when using Solr request handler without MCF, this exception 
> is not thrown and the posting 0 byte files is indexed normally.
>  
> 2011/09/13 12:30:40 org.apache.solr.core.SolrCore execute
> ???: [] webapp=/solr path=/update/extract 
> params={literal.id=file:/Users/abe/Desktop/1/no-content/no-content.txt&literal.uri=/Users/abe/Desktop/1/no-content/no-content.txt}
>  status=400 QTime=367 
> 2011/09/13 12:30:40 org.apache.solr.common.SolrException log
> ?v???I: org.apache.solr.common.SolrException: missing content stream
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-254) Bad request when posting 0 byte file to Solr

2011-09-13 Thread Shinichiro Abe (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103632#comment-13103632
 ] 

Shinichiro Abe commented on CONNECTORS-254:
---

I agree. My approach was hasty. I'll examine again.(especially adding a space)

At least the following results in posting normally on Solr. the exception isn't 
thrown. 

curl 
"http://localhost:8983/solr/update/extract?literal.id=1&stream.file=/path/to/0bytefile&commit=true";
 

I want to make the same behavior for httpposter. What do you think about this?

> Bad request when posting 0 byte file to Solr
> 
>
> Key: CONNECTORS-254
> URL: https://issues.apache.org/jira/browse/CONNECTORS-254
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Shinichiro Abe
>Priority: Minor
> Fix For: ManifoldCF 0.3
>
> Attachments: CONNECTORS-254-1.patch, sample0byte.zip
>
>
> It seems that httpposter brings about bad request when posting 0 byte file.
> Solr log say the below. "missing content stream". Status code is 400. 
> On the other hand when using Solr request handler without MCF, this exception 
> is not thrown and the posting 0 byte files is indexed normally.
>  
> 2011/09/13 12:30:40 org.apache.solr.core.SolrCore execute
> ???: [] webapp=/solr path=/update/extract 
> params={literal.id=file:/Users/abe/Desktop/1/no-content/no-content.txt&literal.uri=/Users/abe/Desktop/1/no-content/no-content.txt}
>  status=400 QTime=367 
> 2011/09/13 12:30:40 org.apache.solr.common.SolrException log
> ?v???I: org.apache.solr.common.SolrException: missing content stream
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-202) SOLR connector suport for commitWithin

2011-09-13 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103622#comment-13103622
 ] 

Karl Wright commented on CONNECTORS-202:


r1170174 for the documentation update.


> SOLR connector suport for commitWithin
> --
>
> Key: CONNECTORS-202
> URL: https://issues.apache.org/jira/browse/CONNECTORS-202
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Jan Høydahl
>Assignee: Karl Wright
>  Labels: commit
> Fix For: ManifoldCF 0.4
>
> Attachments: CONNECTORS-202.patch
>
>
> The output connection must support commitWithin 
> (http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22)
>  in addition to sending a commit() at the end of a job.
> This allows for efficient handling of commits on the Solr side.
> The parameter should ideally be configurable per job. In that way you could 
> say that for "Important job" commitWithin=10s while for "Big crawl job", 
> commitWithin=600s.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-202) SOLR connector suport for commitWithin

2011-09-13 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103589#comment-13103589
 ] 

Karl Wright commented on CONNECTORS-202:


Looks fine.  I'll commit it and update the site this evening.


> SOLR connector suport for commitWithin
> --
>
> Key: CONNECTORS-202
> URL: https://issues.apache.org/jira/browse/CONNECTORS-202
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Jan Høydahl
>Assignee: Karl Wright
>  Labels: commit
> Fix For: ManifoldCF 0.4
>
> Attachments: CONNECTORS-202.patch
>
>
> The output connection must support commitWithin 
> (http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22)
>  in addition to sending a commit() at the end of a job.
> This allows for efficient handling of commits on the Solr side.
> The parameter should ideally be configurable per job. In that way you could 
> say that for "Important job" commitWithin=10s while for "Big crawl job", 
> commitWithin=600s.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-202) SOLR connector suport for commitWithin

2011-09-13 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated CONNECTORS-202:
---

Attachment: CONNECTORS-202.patch

Proposed end user documentation update for the update parameters tab, including 
examples for commitWithin and update.chain

> SOLR connector suport for commitWithin
> --
>
> Key: CONNECTORS-202
> URL: https://issues.apache.org/jira/browse/CONNECTORS-202
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Jan Høydahl
>Assignee: Karl Wright
>  Labels: commit
> Fix For: ManifoldCF 0.4
>
> Attachments: CONNECTORS-202.patch
>
>
> The output connection must support commitWithin 
> (http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22)
>  in addition to sending a commit() at the end of a job.
> This allows for efficient handling of commits on the Solr side.
> The parameter should ideally be configurable per job. In that way you could 
> say that for "Important job" commitWithin=10s while for "Big crawl job", 
> commitWithin=600s.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[RESULT][VOTE] Release ManifoldCF 0.3-incubating, RC1

2011-09-13 Thread Karl Wright
Three +1's.  >72 hours.  Vote passes!
Karl

On Mon, Sep 12, 2011 at 9:52 PM, Shinichiro Abe
 wrote:
>  +1
>
>  The JCIFS Connector and "ant test" work fine!
>
>  Shinichiro Abe
>
>
> On 2011/09/13, at 10:05, Karl Wright wrote:
>
>> Thanks!
>>
>> We need one more binding +1.  Shinichiro?  Simon?  Erlend?  Tommaso?
>>
>> Karl
>>
>> On Mon, Sep 12, 2011 at 12:06 PM, Piergiorgio Lucidi
>>  wrote:
>>> +1
>>>
>>> The CMIS Connector works fine!
>>>
>>> Piergiorgio
>>>
>>> 2011/9/9 Karl Wright 
>>>
 You can download the release candidate from
 http://people.apache.org/~kwright, and there is also a tag in svn
 under https://svn.apache.org/repos/asf/incubator/lcf/tags.

 +1 to release this RC.
 -1 to not release it.

 After a successful release vote, please be aware that I will need to
 present the release candidate to the incubator for their vote as well,
 before the release is actually made.

 Karl

>>>
>>>
>>>
>>> --
>>> Piergiorgio Lucidi
>>> http://about.me/piergiorgiolucidi
>>>
>
>


[jira] [Commented] (CONNECTORS-254) Bad request when posting 0 byte file to Solr

2011-09-13 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103465#comment-13103465
 ] 

Karl Wright commented on CONNECTORS-254:


I have some concerns about this approach.

First, I'm not sure it is the right thing to do.  It would be good to know more 
about the case you are comparing against when you say "On the other hand when 
using Solr request handler without MCF, this exception is not thrown and the 
posting 0 byte files is indexed normally."  How are you posting the content 
normally?  When you post the content in that way, what does the entire http 
request look like?  You can use Wireshark to capture it, or I can, so we can 
see exactly what happens in that case.

Second, if it turns out that adding a space is the correct thing to do, I'm 
concerned because this change is not being reflected in the content-length 
header.  If the header reports a different length than what is posted, the 
posted data will be truncated.  This may be the whole goal, though, in which 
case we should add a comment to the code noting that the content MUST be the 
last field posted for this reason.


> Bad request when posting 0 byte file to Solr
> 
>
> Key: CONNECTORS-254
> URL: https://issues.apache.org/jira/browse/CONNECTORS-254
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Shinichiro Abe
>Priority: Minor
> Fix For: ManifoldCF 0.3
>
> Attachments: CONNECTORS-254-1.patch, sample0byte.zip
>
>
> It seems that httpposter brings about bad request when posting 0 byte file.
> Solr log say the below. "missing content stream". Status code is 400. 
> On the other hand when using Solr request handler without MCF, this exception 
> is not thrown and the posting 0 byte files is indexed normally.
>  
> 2011/09/13 12:30:40 org.apache.solr.core.SolrCore execute
> ???: [] webapp=/solr path=/update/extract 
> params={literal.id=file:/Users/abe/Desktop/1/no-content/no-content.txt&literal.uri=/Users/abe/Desktop/1/no-content/no-content.txt}
>  status=400 QTime=367 
> 2011/09/13 12:30:40 org.apache.solr.common.SolrException log
> ?v???I: org.apache.solr.common.SolrException: missing content stream
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-254) Bad request when posting 0 byte file to Solr

2011-09-13 Thread Shinichiro Abe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichiro Abe updated CONNECTORS-254:
--

Attachment: CONNECTORS-254-1.patch

When the stream length is zero, it replaces space. If this procedure is okay, 
I'll modify the indent around that.

> Bad request when posting 0 byte file to Solr
> 
>
> Key: CONNECTORS-254
> URL: https://issues.apache.org/jira/browse/CONNECTORS-254
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Shinichiro Abe
>Priority: Minor
> Fix For: ManifoldCF 0.3
>
> Attachments: CONNECTORS-254-1.patch, sample0byte.zip
>
>
> It seems that httpposter brings about bad request when posting 0 byte file.
> Solr log say the below. "missing content stream". Status code is 400. 
> On the other hand when using Solr request handler without MCF, this exception 
> is not thrown and the posting 0 byte files is indexed normally.
>  
> 2011/09/13 12:30:40 org.apache.solr.core.SolrCore execute
> ???: [] webapp=/solr path=/update/extract 
> params={literal.id=file:/Users/abe/Desktop/1/no-content/no-content.txt&literal.uri=/Users/abe/Desktop/1/no-content/no-content.txt}
>  status=400 QTime=367 
> 2011/09/13 12:30:40 org.apache.solr.common.SolrException log
> ?v???I: org.apache.solr.common.SolrException: missing content stream
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira