[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-15 Thread Michael Osipov (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743287#comment-16743287
 ] 

Michael Osipov commented on CONNECTORS-1564:


HI [~kwri...@metacarta.com], I was on a business trip for a couple of days. I 
will -- hopefully -- pick this up on Thu.

> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743186#comment-16743186
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o], any updates?


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[ANNOUNCE] Apache Roadshow Chicago, Call for Presentations

2019-01-15 Thread Trevor Grant
Hello Devs!


You're receiving this email because you are subscribed to one or more
Apache developer email lists.

I’m writing to let you know about an exciting event coming to the Chicago
area: The Apache Roadshow Chicago.  It will be held May 13th and 14th at
three bars in the Logan Square neighborhood (Revolution Brewing, The
Native, and the Radler).

There will be six tracks:

   -

   Apache in Adtech:  Tell us how Apache works in your advertising stack
   -

   Apache in Fintech: Tell us how Apache works in your finance/insurance
   business
   -

   Apache in Startups: Tell us how you’re using Apache in your startup
   -

   Diversity in Apache: How do we increase and encourage diversity in
   Apache and tech fields overall?
   -

   Made in Chicago: Apache related things made by people in Chicago that
   don’t fall into other buckets
   -

   Project Shark Tank: Do you want more developers or users for your Apache
   project? Come here and pitch it!


This is an exciting chance to learn about how Apache Projects are in use in
production around Chicago, how business users make the decision to use
Apache projects, to learn about exciting new projects that want help from
developers like you, and how/why to increase diversity in tech and IT.

If you have any use cases of Apache products in Adtech, Fintech, or
Startups; if you represent a minority working in tech and have perspectives
to share, if you live in the Chicagoland area and want to highlight some
work you’ve done on an Apache project, or if you want to get other people
excited to come work on your project, then please submit a CFP before the
deadline on February 15th!

Tickets to the Apache Roadshow Chicago are $100; speakers will get a
complimentary ticket.

We’re looking forward to reading your submissions and seeing you there on
May 13-14!

Sincerely,

Trevor Grant

https://www.apachecon.com/chiroadshow19/cfp.html

https://www.apachecon.com/chiroadshow19/register.html


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743033#comment-16743033
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please also see this discussion:

https://issues.apache.org/jira/browse/CONNECTORS-1533



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743028#comment-16743028
 ] 

Karl Wright commented on CONNECTORS-1563:
-

First, I asked for the Simple History, not the manifoldcf logs.  What does the 
simple history say about document ingestions for the connection in question 
with the new configuration?

But, from your solr log:

{code}
2019-01-15 11:51:54.211 ERROR (qtp592617454-22) [   x:eesolr_webcrawler] 
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
{code}

Note that the stack trace is from the ExtractingDocumentLoader, which is Tika.  
You did not manage to actually change the output handler to the non-extracting 
one, possibly because you have configured your Solr in a non-default way.  I 
cannot debug that for you, sorry.

Can you do the following:  Download the current 7.x version of Solr, fresh, and 
extract it.  Start it using the standard provided simple scripts.  Point 
ManifoldCF at it and crawl some documents, using the setup for the connection I 
have described.  Does that work?  If it does, and I expect it to because that 
is what works for me here, then it is your job to figure out what you did to 
Solr to make that not work.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Subasini Rath (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subasini Rath updated CONNECTORS-1563:
--
Attachment: Document simple history.docx
manifoldcf.log
solr.log

Please find the log files and document simple history.



Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: subasini.r...@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743006#comment-16743006
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please include [INFO] messages from the Solr log for example indexing requests, 
and also include records from the Simple History for documents indexed with the 
new configuration.  Thanks.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742949#comment-16742949
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please view the Solr connection and click the button that tells it to forget 
about everything it has indexed.  That will force reindexing.  That's standard 
step when you change configuration like this and you want all documents to be 
reindexed.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Subasini Rath (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subasini Rath updated CONNECTORS-1563:
--
Attachment: manifold settings.docx

Hi Karl,
  Tried your suggestions in the below email but no luck.
Please find attached the screenshots for my manifold settings. 
Could you please revisit once and let me know if I am missing something.

Also as per your suggestion - In the Solr output connection :
 tab [Paths] --- > I changed [update handler to /update instead of 
/update/extract .
 In [Schema] tab ---> deselect [Use the Extract Update Handler:]. 
What I observe is no indexing happened in Solr.




Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: subasini.r...@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Subasini Rath (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742883#comment-16742883
 ] 

Subasini Rath edited comment on CONNECTORS-1563 at 1/15/19 9:12 AM:


Hi Karl,

  Tried your suggestions in the below email but no luck.

Please find attached the screenshots for my manifold settings.

Could you please revisit once and let me know if I am missing something.

 

Also as per your suggestion - In the Solr output connection :

 tab [Paths] — > I changed [update handler to /update instead of 
/update/extract .

 In [Schema] tab ---> deselect [Use the Extract Update Handler:].

What I observe is no indexing happened in Solr.


was (Author: subasinir):
Hi Karl,

  Tried your suggestions in the below email but no luck.

Please find attached the screenshots for my manifold settings.

Could you please revisit once and let me know if I am missing something.

 

Also as per your suggestion - In the Solr output connection :

 tab [Paths] --- > I changed [update handler to /update instead of 
/update/extract .

 In [Schema] tab ---> deselect [Use the Extract Update Handler:].

What I observe is no indexing happened in Solr.

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Subasini Rath (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742883#comment-16742883
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,

  Tried your suggestions in the below email but no luck.

Please find attached the screenshots for my manifold settings.

Could you please revisit once and let me know if I am missing something.

 

Also as per your suggestion - In the Solr output connection :

 tab [Paths] --- > I changed [update handler to /update instead of 
/update/extract .

 In [Schema] tab ---> deselect [Use the Extract Update Handler:].

What I observe is no indexing happened in Solr.

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)