Re: Bitnami, or other Solr on AWS recommendations?

2018-01-26 Thread Phillip Rhodes
Also shameless self-promotion, but my company (Fogbeam Labs) is about
to launch a Solr / ManifoldCF powered Search-as-a-Service offering.
If you'd like to learn more, shoot me an email at prho...@fogbeam.com
and I'd be happy to give you the skinny.


Phil

This message optimized for indexing by NSA PRISM


On Fri, Jan 26, 2018 at 4:01 PM, Sameer Maggon  wrote:
> Although this is shameless promotion, but have you taken a look at
> SearchStax (https://www.searchstax.com)? Why not use a Solr-as-a-Service?
>
> On Fri, Jan 26, 2018 at 11:24 AM, TK Solr  wrote:
>
>> If I want to deploy Solr on AWS, do people recommend using the prepackaged
>> Bitnami Solr image? Or is it better to install Solr manually on a computer
>> instance? Or are there a better way?
>>
>> TK
>>
>>
>>
>
>
> --
> Sameer Maggon
> Founder, SearchStax, Inc.
> https://www.searchstax.com


Re: Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Phillip Rhodes
Fair enough.  I'm actually using ManifoldCF to manage the indexing,
and I see that they have a TIka Content Extraction transformer
available, so I'll look into wiring that into my pipeline and see if
that gets me the results I'm looking for.


Thanks,


Phil

This message optimized for indexing by NSA PRISM


On Thu, Dec 21, 2017 at 7:43 PM, Erick Erickson  wrote:
> bq: s there any way to get reasonable behavior using the
> ExtractingRequestHandler, or should I just dump that approach and plan
> to run Tika outside of Solr, and then send Solr the exact content I
> want?
>
> Actually, this is recommended for a bunch of reasons, so I'd just
> go there straightaway. Tika has all sorts of "interesting" things to
> cope with, and since the underlying file formats are more-or-less
> followed by this vendor or that, there's always the possibility
> that Tika will kill your Solr.
>
> Here's a place to start:
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
> On Thu, Dec 21, 2017 at 4:31 PM, Phillip Rhodes
>  wrote:
>> Hi all, I have been having an issue with Solr, using the
>> ExtractingRequestHandler.  Basically, when indexing a PDF (for
>> example) I get all the metadata mixed into the "content" field along
>> with the content.  See:
>> <https://stackoverflow.com/questions/47934257/importing-files-with-solr-cell-tika-is-mixing-metadata-fields-with-content>
>> for the gory details.
>>
>> I'm guessing this is the same basic issue as
>> <https://issues.apache.org/jira/browse/SOLR-9178> which is still
>> unresolved.  But I thought I'd ping the list just to see if anyone had
>> a workaround or any more information on this.
>>
>> Is there any way to get reasonable behavior using the
>> ExtractingRequestHandler, or should I just dump that approach and plan
>> to run Tika outside of Solr, and then send Solr the exact content I
>> want?
>>
>>
>> Thanks,
>>
>>
>>
>> This message optimized for indexing by NSA PRISM


Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Phillip Rhodes
Hi all, I have been having an issue with Solr, using the
ExtractingRequestHandler.  Basically, when indexing a PDF (for
example) I get all the metadata mixed into the "content" field along
with the content.  See:

for the gory details.

I'm guessing this is the same basic issue as
 which is still
unresolved.  But I thought I'd ping the list just to see if anyone had
a workaround or any more information on this.

Is there any way to get reasonable behavior using the
ExtractingRequestHandler, or should I just dump that approach and plan
to run Tika outside of Solr, and then send Solr the exact content I
want?


Thanks,



This message optimized for indexing by NSA PRISM


[SOLVED] [MORE OR LESS] "no such core" error with EmbeddedSolrServer

2012-01-06 Thread Phillip Rhodes
OK, this is working now.  There were a couple of things going on, but
the net-net was "classloading issues."
The initial "no such core" problem was happening because the cores
weren't initializing properly due
to a NoClassDefFoundError.   But after fixing that, I wound up with a
pile of ClassCastExceptions... the kind
you get when there are Classloader conflicts...  I'd seen this kind of
thing before when using Groovy (possibly
because it does extra classloading trickery under the covers?) so I
ported the program to plain old Java
and it works fine.

No biggie... at this point, I've basiclaly decided I'm going to run
Solr as a separate process anyway, and use
the http based SolrJ client for integration, and that works fine even
in the Groovy code.


Thanks for all the help, tips and pointers!  I've learned a lot about
Solr in the past day or two.  :-)



Phil

On Thu, Jan 5, 2012 at 7:45 PM, Phillip Rhodes
 wrote:
> Hi all, I'm having an issue that I hope someone can shed some light on.
>
> I have a Groovy program, using Solr 3.5, where I am attempting to use
> EmbeddedSolrServer using the instructions shown here:
>
> http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
>
> to that end, I have code setup like this:
>
> ...
>
> System.setProperty('solr.solr.home',
> '/usr/servers/solr/apache-solr-3.5.0/example/heceta');
> CoreContainer.Initializer initializer = new CoreContainer.Initializer();
> CoreContainer coreContainer = initializer.initialize();
>
> EmbeddedSolrServer solrServer = new EmbeddedSolrServer(coreContainer, '' );
>
> to initialize the solrServer.  But when I try to add a doc using the 
> solrServer
> instance, I get an exception:
>
> org.apache.solr.common.SolrException: No such core:
>        at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:104)
>        at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
>        at org.apache.solr.client.solrj.SolrServer$add.call(Unknown Source)
>        at 
> org.fogbeam.exp.IndexerWithOwnerInfo.indexFile(IndexerWithOwnerInfo.groovy:150)
>
>
> My solr.xml looks like this:
>
> 
>
>  
>  
>    
>  
> 
>
>
>
> Can somebody tell me if the documentation on how to set this up is
> wrong, or if there is a solr bug, or if there
> is just something I've missed in my configuration?
>
>
> Thanks,
>
>
> Phillip


Re: "no such core" error with EmbeddedSolrServer

2012-01-06 Thread Phillip Rhodes
On Fri, Jan 6, 2012 at 11:28 AM, Christopher Childs  wrote:
> Multicore does work with EmbeddedSolrServer. It's what we use in our 
> application.
>
> solr.xml is also relevant for configuring the cores. We do not do it in quite 
> the same manner that Phillip is describing, though. Our CoreContainer is 
> initialized by SolrDispatchFilter. After the core container is created, we 
> just do 'new EmbeddedSolrServer(cores, "firstCoreNameInSolrXml");' -- it even 
> worked without a core name specified, but the behavior was erratic and 
> incorrect in other ways in that case.

Cool, that's good to know.

>
> I would suggest enabling logging for the Solr classes that are likely to help 
> explain some of the mystery here: SolrResourceLoader, and CoreContainer. At 
> the info level, they will tell you where they are finding solr.xml and how 
> they are finding it (whether it's passed in via constructor arguments, or 
> through solr.solr.home).


I'll look into that this afternoon then.   Thanks for the pointer.


Phil


Re: "no such core" error with EmbeddedSolrServer

2012-01-06 Thread Phillip Rhodes
On Fri, Jan 6, 2012 at 11:03 AM, Erik Hatcher  wrote:
>
> Again, EmbeddedSolrServer (unless I'm gravely mistaken!) doesn't do 
> multicore.  It's for a single core.  If you want multiple cores
> supported... created multiple EmbeddedSolrServer instances.   You point the 
> instance to the core's "home".

At this very moment, all I personally care about is getting it to work
with one core.  However, even when I take the approach shown in the
documentation, under where it says "if you want to user multicore
features" it still errors out the same way.  That is,

File home = new File( "/path/to/solr/home" );
File f = new File( home, "solr.xml" );
CoreContainer container = new CoreContainer();
container.load( "/path/to/solr/home", f );

EmbeddedSolrServer server = new EmbeddedSolrServer( container,
"core name as defined in solr.xml" );

It's still showing the configuration as driving off of solr.xml,
although it does tell you to pass the core name explicitly to the
ctor.  But if you're saying that solr.xml isn't used when using
EmbeddedSolrServer, is this example even relevant at all?

When I get home from the $dayjob later and can revisit this, I'll try
the first approach, with the path pointing all the way down to the
specific core.  If that works, I'll be happy, but I'm thinking this
wiki page may need some updating by somebody who understands how all
this works...  :-)


Phil


Re: "no such core" error with EmbeddedSolrServer

2012-01-06 Thread Phillip Rhodes
2012/1/6 Yury Kats :
>
> That probably means the home is not set properly, so it can't find solr.xml
>
Well, all the docs mention doing is this bit, which I have:

System.setProperty('solr.solr.home',
'/usr/servers/solr/apache-solr-3.5.0/example/heceta');

I've also tried it the other way mentioned on the wiki, where you do this:

File home = new File( "/path/to/solr/home" );
File f = new File( home, "solr.xml" );
CoreContainer container = new CoreContainer();
container.load( "/path/to/solr/home", f );

EmbeddedSolrServer server = new EmbeddedSolrServer( container,
"core name as defined in solr.xml" );

and I get the exact same error when trying that approach as well.  :-(


Phil


Re: "no such core" error with EmbeddedSolrServer

2012-01-06 Thread Phillip Rhodes
2012/1/6 Yury Kats :
>
> Have you tried passing core name (collection1) to the c'tor, instead
> of the empty string?

Yep, but that gives the same error (with the core name appended) such
as "no such core: collection1"


Phil


Re: "no such core" error with EmbeddedSolrServer

2012-01-06 Thread Phillip Rhodes
On Fri, Jan 6, 2012 at 8:06 AM, Erik Hatcher  wrote:
> Also note that an EmbeddedSolrServer is for a specific core, not for all 
> cores (and thus solr.xml is not used).  My hunch is that you > need to point 
> to the core's home directory, not to the parent of solr.xml.

Oh, interesting.  The example on the Solr wiki had given me the
impression that it's the other way, but I'll definitely give that a
try.  Thanks for the heads-up.


Phil


Re: "no such core" error with EmbeddedSolrServer

2012-01-06 Thread Phillip Rhodes
On Fri, Jan 6, 2012 at 3:06 AM, Sven Maurmann  wrote:
> Hi,
>
> from your snippets the reason is not completely clear. There are a number of 
> reasons for not starting up the
> server. For example in case of a faulty configuration of the core 
> (solrconfig.xml, schema.xml) the core does
> not start and you get the reported error.

Yeah, that I noticed... I had some such errors earlier, that I noticed
when starting the Solr / Jetty standalone instance, but those have
been resolved, and now I can launch Solr as a process, and use the
SolrJ implementation that talks http to it - from my program - and
everything works as expected.  But still no joy with the
EmbeddedSolrServer.  :-(

If nothing obvious jumps out at anybody, I guess I'll just put a
breakpoint in the code that initializes the EmbeddedSolrServer and
start stepping through it and see if I can sort out what's going on...


Phil


"no such core" error with EmbeddedSolrServer

2012-01-05 Thread Phillip Rhodes
Hi all, I'm having an issue that I hope someone can shed some light on.

I have a Groovy program, using Solr 3.5, where I am attempting to use
EmbeddedSolrServer using the instructions shown here:

http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

to that end, I have code setup like this:

...

System.setProperty('solr.solr.home',
'/usr/servers/solr/apache-solr-3.5.0/example/heceta');
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();

EmbeddedSolrServer solrServer = new EmbeddedSolrServer(coreContainer, '' );

to initialize the solrServer.  But when I try to add a doc using the solrServer
instance, I get an exception:

org.apache.solr.common.SolrException: No such core:
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:104)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
at org.apache.solr.client.solrj.SolrServer$add.call(Unknown Source)
at 
org.fogbeam.exp.IndexerWithOwnerInfo.indexFile(IndexerWithOwnerInfo.groovy:150)


My solr.xml looks like this:



  
  

  




Can somebody tell me if the documentation on how to set this up is
wrong, or if there is a solr bug, or if there
is just something I've missed in my configuration?


Thanks,


Phillip


Build query programmatically with lucene, but issue to solr?

2010-05-28 Thread Phillip Rhodes
Hi.
I am building up a query with quite a bit of logic such as parentheses, plus
signs, etc... and it's a little tedious dealing with it all at a string
level.  I was wondering if anyone has any thoughts on constructing the query
in lucene and using the string representation of the query to send to solr.

Thanks,
Phillip


serialize SolrInputDocument to java.io.File and back again?

2009-12-30 Thread Phillip Rhodes
I want to store a SolrInputDocument to the filesystem until it can be sent
to the solr server via the solrj client.

I will be using a quartz job to periodically query a table that contains a
listing of SolrInputDocuments stored as java.io.File that need to be
processed.

Thanks for your time.


Re: using solr as master for data storage/retrie val?

2008-05-08 Thread Phillip Rhodes

B,
My thoughts are coming from experience while writing and using stitches.  
Stitches is a java-based project that allows local and remote java clients 
(using hessian for java, xfire for dotnet) to search, store and retrieve images 
and image meta data.  We are using it to store 10 Gb's of images and the search 
is wicked fast.  We use it to allow users to associate images to galleries, 
events, etc..  It's using compass/lucene right now.  The API is a lot like 
amazon S3, but this is just a coincidence of solving the same problem.  We are 
using it in dotnet and java.

I was thinking that one of the benefits of solr is that of replication.  
Currently, there is only one production instance of stitches, and we are using 
it to power image serving, image thumbnails for 5 major sites.   If stitches 
goes down, these sites would not have images, and I would be in trouble.

By using solr, I was hoping I could get more scalability by leveraging the 
rsync/replication so my search index and it's data (image binary files) would 
be clustered across multiple machines. 

Thanks!


-Original Message-
From: Norberto Meijome <[EMAIL PROTECTED]>
Sent: Thursday, May 8, 2008 3:37am
To: solr-user@lucene.apache.org
Subject: Re: using solr as master for data storage/retrieval?

On Wed, 7 May 2008 11:26:50 -0400 (EDT)
"Phillip Rhodes" <[EMAIL PROTECTED]> wrote:

> I currently have a java-based application that stores all objects on the file 
> system (text, blobs) and uses lucene to search the objects.  If I can store 
> these objects in solr, I would greatly increase the scalability of my 
> application.

Hi Phillip,
I'm interested in why you think that SOLR would be more scalable than FS for 
retrieving files.I am not saying it isn't. For my use cases, it wouldn't be, 
but your case may be different.

Do you have any numbers / theories that make you think that would be the case? 

How big are the files you intend to store? What file types? how many queries / 
sec do you expect to serve?

cheers,
B
_
{Beto|Norberto|Numard} Meijome

Intelligence: Finding an error in a Knuth text.
Stupidity: Cashing that $2.56 check you got.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.




using solr as master for data storage/retrieval?

2008-05-07 Thread Phillip Rhodes

I currently have a java-based application that stores all objects on the file 
system (text, blobs) and uses lucene to search the objects.  If I can store 
these objects in solr, I would greatly increase the scalability of my 
application.

Would it be safe to replace the filesystem with solr in this case?  

Thank you.






custom queries via plugins?

2008-05-05 Thread Phillip Rhodes
I am currently using lucene directly to build custom queries.  Can I write a 
plugin to build these custom BooleanQueries, RangeQueries, etc...?  As a simple 
example, we have documents that represent coupons, events and activities.  Some 
searches may only be for coupons and events. Currently, I programmatically 
build up a boolean query for this.  I wanted to know if I could still do this 
with solr.

I just wanted to get a little bit of validation before investing a few hours 
into actually trying to use solr.  I have been reading the tutorials, docs, but 
while I suspect that solr exposes the lucene query via plugins, I have not seen 
this spelled out (but I'm a bad speller;)



Thank you for your time.

Phillip



solr vs custom JMS for replication?

2007-03-08 Thread Phillip Rhodes
Hi everyone,
I have a open source app under development called "authsum" which is a 
sso/identity/authorization server that supports user registration, openid,sso.  
It's a "search engine for authorizations" because the authorizations are stored 
in a lucene index accessible via xfire.  There will be a dotnet and ruby 
client.  http://www.authsum.org/overview/index.html

I am using JMS to keep my lucene indexes in sync.

My applications (admin application,registration,login applications) publish 
messages (i.e. userid, group addition, etc) onto a JMS topic.
There are other running applications that subscribes to the topic and processes 
the index changes.

I am trying to "cut down" on the engineering and was wondering if solr would be 
a better fit for my needs.  

As I see it, my custom JMS solution means that there are potentially many 
IndexWriters out there (and more processing) since the same processing work 
needs to be performed on all indexes.  This could also be a problem since there 
is more of a possibility that indexes could get out of sync with one another.  
For these reasons, I am thinking that solr would be better for me than JMS.

The drawbacks:
1) I would need to write my application to post xml documents to lucene vs. my 
lucene programming that I do now.
2) Do I have direct access to the lucene index to do queries?  Or do I need to 
rewrite my app for that also?