Re: Pushing a whole set of pdf-files to solr

2013-04-24 Thread sdspieg
I am still struggling with this. I have solr 4.2.1.2013.03.26.08.26.55
installed. So are you telling me that I should somehow install the older
version of that tool that comes with Solr 3.x? Because with the newer
version I get the errors I already mentioned. Now I suppose I may be an
untypical user, as I am running all of this under windows and really just
want to find an easy way to get a whole bunch of files from a local folder
(on my harddrive) into my local version of solr. But so is there really no
easier way of doing this? 

-Stephan 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4058776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pushing a whole set of pdf-files to solr

2013-04-24 Thread sdspieg
(Just documenting my experiences). I stopped and restarted solr in the tomcat
web application manager. Everything seems fine
http://lucene.472066.n3.nabble.com/file/n4058786/4-25-2013_2-38-43_AM.png 
And yet I still get that same error message. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4058786.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pushing a whole set of pdf-files to solr

2013-04-10 Thread sdspieg
Jack - I apologize for my ignorance here, but when you keep emphasizing 'new'
- does that mean that there is ANOTHER version of this tool than the one
that is built into solr-4.2.1? 
And on the encoding issue - I thought pdf was platform-agnostic? Or is the
problem on my windows system - i.e. that it extracts the (correctly encoded)
text into Win-1251, which solr then has a problem with? But can't I change
that somewhere then?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4055010.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread sdspieg
If anybody could still help me out with this, I'd really appreciate it.
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054885.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread sdspieg
Thanks for those replies. I will look into them. But if anyone knows of a
site that describes step by step how a windows user who has already
installed solr (and tomcat) can easily feed a folder (and subfolders) with
100s of pdfs into solr, or would be willing to write down down those steps,
I would really appreciate the reference. And I bet you there are lots of
people like me... 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread sdspieg
I am able to run the java -jar post.jar -help command which I found here:
http://docs.lucidworks.com/display/solr/Running+Solr. But now how can I tell
post to post all pdf files in a certain folder (preferably recursively) to a
collection? Could anybody please post the exact command for that? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054916.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread sdspieg
Another progress report. I 'flattened' all the folders which contained the
pdf files with Fileboss and then moved the pdf files to the directory where
I found the post.jar file (in solr-4.2.1\solr-4.2.1\example\exampledocs). I
then ran java -Ddata=files -jar post.jar *.pdf and in the command window
it seemed to be working fine (these are just academic articles in pdf-format
that I downloaded with ZOtyero from EBSCO):
04/10/2013  12:20 AM   159,224 Vorontsov - 2012 - The Korea- Russia
Gas
Pipeline Project Past, Pres.pdf
04/10/2013  12:12 AM 3,885,056 Walker - 2012 - Asia competes for
energy
security.pdf
04/10/2013  12:45 AM66,195 Whitmill - 2012 - Is UK Energy Policy
Dri
ving Energy Innovation - or.pdf
04/10/2013  12:29 AM 2,208,367 Wietfeld - 2011 - Understanding
Middle Ea
st Gas Exporting Behavior.pdf
04/10/2013  12:59 AM 3,011,185 Wiseman - 2011 - Expanding Regional
Renew
able Governance.pdf
04/10/2013  12:38 AM   180,692 Woudhuysen - 2012 - Innovation in
Energy
Expressions of a Crisis, and.pdf
04/10/2013  12:49 AM   229,991 Yergin - 2012 - How Is Energy
Remaking th
e World.pdf
04/10/2013  12:40 AM 3,397,328 Young - 2012 - Industrial Gases.
(cover s
tory).pdf
04/10/2013  01:36 AM73,125 Zimmerer - 2011 - New Geographies of
Ener
gy Introduction to the Spe.pdf
... and so on, all together some 300 articles.

But then when I looked in solr, I saw the following:
04:34:41
SEVERE
SolrCore
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at
char #10,​ byte #-1)
04:34:41
SEVERE
SolrCore
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at
char #10,​ byte #-1)

... and a lot more of those.

I'd like to think I made SOME progress, but it also seems like I'm still not
close to being there. Any suggestions from the experts here on what I am
doing wrong? 

Thanks!

-Stephan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054920.html
Sent from the Solr - User mailing list archive at Nabble.com.