Re: Question on Solr/WordPress Integration
If you’re more familiar with PHP you can do the same using the Solarium library instead of SolrJ for Java. Once the PDFs are extracted and indexed, Drupal is an alternative to Wordpress as Frontend. Using the Serach API Solr module you can access and „present“ any existing Solr index without a single line of custom code. Markus > Am 02.03.2019 um 01:30 schrieb Erick Erickson : > > Writing a Java (SolrJ) program that traverses a filesystem and extracts the > contents of PDF is actually quite simple, see: > https://lucidworks.com/2012/02/14/indexing-with-solrj/ (you can ignore the > RDBMS stuff). That code is a little out of date so may need some very minor > tweaks. > > Tika (the library Solr uses to parse PDFs and most other files) may have > something that makes the job even easier, I’d ask on their user’s list. > Putting WordPress in the middle of it all seems unnecessarily complicated. > > Best, > Erick > >> On Mar 1, 2019, at 11:18 AM, Paul Buiocchi wrote: >> >> Thank you Shawn ! >> >> Sent from Yahoo Mail on Android >> >> On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi >> wrote: Greetings, >> >> I have a couple of questions about Solr /Wordpress integration - >> >> First , I am not "committed to using WordPress as a front end. If there is a >> better front end option , I would be willing to convert. For functionality , >> all I am looking for is the ability to full txt search , highlight the >> search terms in the search results It should be pretty simple , maybe I >> am overanalyzing it ...Looking for as much "out of the box" as possible >> >> My scenario is this: >> >> I am putting together an old newspaper archive site . about 25k pdf files >> that are full txt searchable. >> >> Questions on architecture: >> 1) Is there a way for Solr to index from a local file structure i.e local >> drive:/newpaper_name/date/page# ? . From the experimenting I have done with >> Wordpress/Solr integration , I found that I had to upload the documents in >> Wordpress to get Solr to recognize them . >> >> I'm sure I will have more questions , any help/suggestions would be greatly >> appreciated - thank you >> >> Sent from Yahoo Mail on Android >
Re: Question on Solr/WordPress Integration
Writing a Java (SolrJ) program that traverses a filesystem and extracts the contents of PDF is actually quite simple, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/ (you can ignore the RDBMS stuff). That code is a little out of date so may need some very minor tweaks. Tika (the library Solr uses to parse PDFs and most other files) may have something that makes the job even easier, I’d ask on their user’s list. Putting WordPress in the middle of it all seems unnecessarily complicated. Best, Erick > On Mar 1, 2019, at 11:18 AM, Paul Buiocchi wrote: > > Thank you Shawn ! > > Sent from Yahoo Mail on Android > > On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi > wrote: Greetings, > > I have a couple of questions about Solr /Wordpress integration - > > First , I am not "committed to using WordPress as a front end. If there is a > better front end option , I would be willing to convert. For functionality , > all I am looking for is the ability to full txt search , highlight the search > terms in the search results It should be pretty simple , maybe I am > overanalyzing it ...Looking for as much "out of the box" as possible > > My scenario is this: > > I am putting together an old newspaper archive site . about 25k pdf files > that are full txt searchable. > > Questions on architecture: > 1) Is there a way for Solr to index from a local file structure i.e local > drive:/newpaper_name/date/page# ? . From the experimenting I have done with > Wordpress/Solr integration , I found that I had to upload the documents in > Wordpress to get Solr to recognize them . > > I'm sure I will have more questions , any help/suggestions would be greatly > appreciated - thank you > > Sent from Yahoo Mail on Android
Re: Question on Solr/WordPress Integration
Thank you Shawn ! Sent from Yahoo Mail on Android On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi wrote: Greetings, I have a couple of questions about Solr /Wordpress integration - First , I am not "committed to using WordPress as a front end. If there is a better front end option , I would be willing to convert. For functionality , all I am looking for is the ability to full txt search , highlight the search terms in the search results It should be pretty simple , maybe I am overanalyzing it ...Looking for as much "out of the box" as possible My scenario is this: I am putting together an old newspaper archive site . about 25k pdf files that are full txt searchable. Questions on architecture: 1) Is there a way for Solr to index from a local file structure i.e local drive:/newpaper_name/date/page# ? . From the experimenting I have done with Wordpress/Solr integration , I found that I had to upload the documents in Wordpress to get Solr to recognize them . I'm sure I will have more questions , any help/suggestions would be greatly appreciated - thank you Sent from Yahoo Mail on Android
Re: Question on Solr/WordPress Integration
On 3/1/2019 10:25 AM, Paul Buiocchi wrote: I have a couple of questions about Solr /Wordpress integration - You would need to talk to the person who wrote the plugin for Wordpress that integrates with Solr. If they indicate that a question can only be answered by the Solr project, then bring that to us. I am putting together an old newspaper archive site . about 25k pdf files that are full txt searchable. If you want Solr to index your PDF documents, you would have to use SolrCell, also known as the Extracting Request Handler. We strongly recommend that this functionality should never be used in production. The reason is that the underlying technology, Apache Tika, can crash when given certain input. PDF documents are more likely than other kinds to cause this problem. If Tika crashes when it is being run inside Solr, then Solr will also crash. Questions on architecture: 1) Is there a way for Solr to index from a local file structure i.e local drive:/newpaper_name/date/page# ? . From the experimenting I have done with Wordpress/Solr integration , I found that I had to upload the documents in Wordpress to get Solr to recognize them . Yes, you can index just about anything you like if you are willing to create the configuration and the software to do it. But in order for Wordpress to understand that data, it most likely would have to be done through Wordpress. Thanks, Shawn
Question on Solr/WordPress Integration
Greetings, I have a couple of questions about Solr /Wordpress integration - First , I am not "committed to using WordPress as a front end. If there is a better front end option , I would be willing to convert. For functionality , all I am looking for is the ability to full txt search , highlight the search terms in the search results It should be pretty simple , maybe I am overanalyzing it ...Looking for as much "out of the box" as possible My scenario is this: I am putting together an old newspaper archive site . about 25k pdf files that are full txt searchable. Questions on architecture: 1) Is there a way for Solr to index from a local file structure i.e local drive:/newpaper_name/date/page# ? . From the experimenting I have done with Wordpress/Solr integration , I found that I had to upload the documents in Wordpress to get Solr to recognize them . I'm sure I will have more questions , any help/suggestions would be greatly appreciated - thank you Sent from Yahoo Mail on Android