Storing metadata from post parameters and XML

2011-01-10 Thread Walter Closenfleight
I'm very unclear on how to associate what I need to a Solr index entry.
Based on what I've read thus far, you can extract data from text files and
store that in a Solr document.

I have hundreds of thousands of documents in a database/svn type system.
When I index a file, it is likely going to be local to the filesystem and I
know the location it will take on in the database. So, when I index, I want
to provide a path that it can find it when someone else does a search.

123.xml may look like:


my title
Every foobar has its day
My caption


and the proprietary location I want it to be associated with is:

/abc/def/ghi/123.xml

So, when a user does a search for "foobar", it returns some information
about 123.xml but most importantly the location should be available.

I have yet to find (in the schema.xml or otherwise) where you can define
that path to store, and how you would pass along that parameter in the
indexing of that document.

Instead, from the examples I can find, including the book, you store fields
from your data into the index. In the book's examples (a music database),
searching for "Cherub Rock" returns a list of with their duration, track
name, album name, and artist. In other words, the full text data you
retrieve is the only information the search index has to offer.

Just for example, using the exampledocs post.jar, I'm envisioning something
like this:

java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml" -othermeta1
"xxx" -othermeta2 "zzz"

Then the Solr doc would look like:

123
/abc/def/ghi/123.xml
xxx
zzz
my title
/abc/xxx.gif
Every foobar has its day My caption


This way, when a user searches for foobar, they get item 123 back, review
the search result and if they decide that's the data they want, they can use
the dblocation field to retrieve the data for editing purposes (and then
re-index it following their edits).

I'm guessing I just haven't found the right terms yet to look into, as I'm
very new to this. Thanks for any direction you can provide. Also, if Solr
appears to be the wrong tool for what I need, let me know as well!

Thank you,
Walter


Re: Storing metadata from post parameters and XML

2011-01-10 Thread Walter Closenfleight
Stefan,



You're right. I was attempting to post some quick pseudo-code, but that
 is pretty misleading, they should have been  elements, like /abc/def/ghi/123.xml, or something to that affect.



Thanks,

Walter


On Mon, Jan 10, 2011 at 10:08 AM, Stefan Matheis <
matheis.ste...@googlemail.com> wrote:

> Hey Walter,
>
> what's against just putting your db-location in a 'string' field, and use
> it
> like any other value?
> There is no special field-type for something like a
> path/directory/location-information, afaik.
>
> Regards
> Stefan
>
> On Mon, Jan 10, 2011 at 4:50 PM, Walter Closenfleight <
> walter.p.closenflei...@gmail.com> wrote:
>
> > I'm very unclear on how to associate what I need to a Solr index entry.
> > Based on what I've read thus far, you can extract data from text files
> and
> > store that in a Solr document.
> >
> > I have hundreds of thousands of documents in a database/svn type system.
> > When I index a file, it is likely going to be local to the filesystem and
> I
> > know the location it will take on in the database. So, when I index, I
> want
> > to provide a path that it can find it when someone else does a search.
> >
> > 123.xml may look like:
> >
> > 
> > my title
> > Every foobar has its day
> > My caption
> > 
> >
> > and the proprietary location I want it to be associated with is:
> >
> > /abc/def/ghi/123.xml
> >
> > So, when a user does a search for "foobar", it returns some information
> > about 123.xml but most importantly the location should be available.
> >
> > I have yet to find (in the schema.xml or otherwise) where you can define
> > that path to store, and how you would pass along that parameter in the
> > indexing of that document.
> >
> > Instead, from the examples I can find, including the book, you store
> fields
> > from your data into the index. In the book's examples (a music database),
> > searching for "Cherub Rock" returns a list of with their duration, track
> > name, album name, and artist. In other words, the full text data you
> > retrieve is the only information the search index has to offer.
> >
> > Just for example, using the exampledocs post.jar, I'm envisioning
> something
> > like this:
> >
> > java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml" -othermeta1
> > "xxx" -othermeta2 "zzz"
> >
> > Then the Solr doc would look like:
> > 
> > 123
> > /abc/def/ghi/123.xml
> > xxx
> > zzz
> > my title
> > /abc/xxx.gif
> > Every foobar has its day My caption
> > 
> >
> > This way, when a user searches for foobar, they get item 123 back, review
> > the search result and if they decide that's the data they want, they can
> > use
> > the dblocation field to retrieve the data for editing purposes (and then
> > re-index it following their edits).
> >
> > I'm guessing I just haven't found the right terms yet to look into, as
> I'm
> > very new to this. Thanks for any direction you can provide. Also, if Solr
> > appears to be the wrong tool for what I need, let me know as well!
> >
> > Thank you,
> > Walter
> >
>


Solr - search queries not returning results

2011-06-28 Thread Walter Closenfleight
Hello everyone,

I believe I am missing something very elementary. The following query
returns zero hits:

http://localhost:8983/solr/core0/select/?q=testabc

However, using solritas, it finds many results:

http://localhost:8983/solr/core0/itas?q=testabc

Do you have any idea what the issue may be?

Thanks in advance!


Re: Solr - search queries not returning results

2011-06-29 Thread Walter Closenfleight
Thanks to both of you, I understand now and am now getting the expected
results.

Cheers!

On Wed, Jun 29, 2011 at 2:21 AM, Ahmet Arslan  wrote:

>
> > I believe I am missing something very elementary. The
> > following query
> > returns zero hits:
> >
> > http://localhost:8983/solr/core0/select/?q=testabc
>
> With this URL, you are hitting the RequestHandler defined as
>  in your
> core0/conf/solrconfig.xml.
>
> > However, using solritas, it finds many results:
> >
> > http://localhost:8983/solr/core0/itas?q=testabc
>
> With this one, you are hitting the one registered as  name="/itas">
>
> > Do you have any idea what the issue may be?
>
> Probably they have different default parameters configured.
>
> For example (e)dismax versus lucene query parser. lucene query parser
> searches testabc in your default field. dismax searches it in all of the
> fields defined in qf parameter.
>
> You can see the full parameter list by appending &echoParams=all to your
> search URL.
>


Methods for preserving text entities?

2011-06-29 Thread Walter Closenfleight
We have some text entities in fields to index (and search) like so:

Solr is a really &myword; search engine!

I would like to preserve/protect &myword; and not resolve it in the indexing
or search results.

What sort of methods have people used? I realize the results are returned in
XML format, so preserving these text entities may be hard. Are people
replacing the "&" character or doing something else?

Thanks in advance!


Re: Methods for preserving text entities?

2011-06-29 Thread Walter Closenfleight
Ah, I think I suddenly answered my own question, but appreciate further
insight if you have it. I converted the & in &myword; to an & so it
looks like this:

 Solr is a really &myword; search engine!



On Wed, Jun 29, 2011 at 12:40 PM, Walter Closenfleight <
walter.p.closenflei...@gmail.com> wrote:

> We have some text entities in fields to index (and search) like so:
>
> Solr is a really &myword; search engine!
>
> I would like to preserve/protect &myword; and not resolve it in the
> indexing or search results.
>
> What sort of methods have people used? I realize the results are returned
> in XML format, so preserving these text entities may be hard. Are people
> replacing the "&" character or doing something else?
>
> Thanks in advance!
>


Multicore clustering setup problem

2011-06-29 Thread Walter Closenfleight
I had set up the clusteringComponent in solrconfig.xml for my first core. It
has been working fine and now I want to get my next core working. I set up
the second core with the clustering component so that I could use it, use
solritas properly, etc. but Solr did not like the solrconfig.xml changes for
the second core. I'm getting this error when Solr is started or when I hit a
Solr related URL:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent'

Should the clusteringComponent be set up in a shared configuration file
somehow or is there something else I am doing wrong?

Thanks in advance!


Re: Multicore clustering setup problem

2011-06-30 Thread Walter Closenfleight
lrResourceLoader.java:359)
 ... 38 more
Jun 30, 2011 8:51:23 AM org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=/opt/tomcat/solr/myapp

On Thu, Jun 30, 2011 at 12:57 AM, Stanislaw Osinski <
stanislaw.osin...@carrotsearch.com> wrote:

> Hi,
>
> Can you post the full strack trace? I'd need to know if it's
> really org.apache.solr.handler.clustering.ClusteringComponent that's
> missing
> or some other class ClusteringComponent depends on.
>
> Cheers,
>
> Staszek
>
> On Thu, Jun 30, 2011 at 04:19, Walter Closenfleight <
> walter.p.closenflei...@gmail.com> wrote:
>
> > I had set up the clusteringComponent in solrconfig.xml for my first core.
> > It
> > has been working fine and now I want to get my next core working. I set
> up
> > the second core with the clustering component so that I could use it, use
> > solritas properly, etc. but Solr did not like the solrconfig.xml changes
> > for
> > the second core. I'm getting this error when Solr is started or when I
> hit
> > a
> > Solr related URL:
> >
> > SEVERE: org.apache.solr.common.SolrException: Error loading class
> > 'org.apache.solr.handler.clustering.ClusteringComponent'
> >
> > Should the clusteringComponent be set up in a shared configuration file
> > somehow or is there something else I am doing wrong?
> >
> > Thanks in advance!
> >
>


Re: Multicore clustering setup problem

2011-06-30 Thread Walter Closenfleight
Staszek,

That makes sense, but this has always been a multi-core setup, so the paths
have not changed, and the clustering component worked fine for core0. The
only thing new is I have fine tuned core1 (to begin implementing it).
Previously the solrconfig.xml file was very basic. I replaced it with
core0's solrconfig.xml and made very minor changes to it (unrelated to
clustering) - it's a nearly identical solrconfig.xml file so I'm surprised
it doesn't work for core1.

In other words, the paths here are the same for core0 and core1:
  
  
  
  
Again, I'm wondering if perhaps since both cores have the clustering
component, if it should have a shared configuration in a different file used
by both cores(?). Perhaps the duplicate clusteringComponent configuration
for both cores is the problem?

Thanks for looking at this!

On Thu, Jun 30, 2011 at 1:29 PM, Stanislaw Osinski <
stanislaw.osin...@carrotsearch.com> wrote:

> It looks like the whole clustering component JAR is not in the classpath. I
> remember that I once dealt with a similar issue in Solr 1.4 and the cause
> was the relative path of the  tag being resolved against the core's
> instanceDir, which made the path incorrect when directly copying and
> pasting
> from the single core configuration. Try correcting the relative  paths
> or replacing them with absolute ones, it should solve the problem.
>
> Cheers,
>
> Staszek
>


Solrj - when a request fails

2011-09-23 Thread Walter Closenfleight
*
I have a java program which sends thousands of Solr XML files up to Solr
using the following code. It works fine until there is a problem with one of
the Solr XML files. The code fails on the solrServer.request(up) line, but
it does not throw an exception, my application therefore cannot catch it and
recover, and my whole application dies.

I've fixed this individual file that made it fail, but want to better trap
these so my application does not die.

Thanks for any insight you can provide. Java code and log below-


// ... start of a loop to process each file removed ...

try {

   String xml = read(filename);
   DirectXmlRequest up = new DirectXmlRequest( "/update", xml );

   solrServer.request( up );
   solrServer.commit();

} catch (SolrServerException e) {
   log.warn("Exception: "+ e.toString());
   throw new MyException(e);
} catch (IOException e) {
   log.warn("Exception: "+ e.toString());
   throw new MyException(e);
}
DEBUG >> "[\n]" - (Wire.java:70)
DEBUG Request body sent - (EntityEnclosingMethod.java:508)
DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
DEBUG << "Server: Apache-Coyote/1.1[\r][\n]" - (Wire.java:70)
DEBUG << "Content-Type: text/html;charset=utf-8[\r][\n]" - (Wire.java:70)
DEBUG << "Content-Length: 1271[\r][\n]" - (Wire.java:70)
DEBUG << "Date: Fri, 23 Sep 2011 12:08:05 GMT[\r][\n]" - (Wire.java:70)
DEBUG << "Connection: close[\r][\n]" - (Wire.java:70)
DEBUG << "[\r][\n]" - (Wire.java:70)
DEBUG << "Apache Tomcat/6.0.29 - Error
report
HTTP Status 400 - Unexpected character 'x' (code 120) in
prolog; expected '<'[\n]" - (Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]type Status reportmessage
Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
(Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]description
" - (Wire.java:84)
DEBUG << "The request sent by the client was syntactically incorrect
(Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
(Wire.java:70)
DEBUG << " at [row,col {unknown-source}]: [3,1]).Apache Tomcat/6.0.29" -
(Wire.java:84)
DEBUG Should close connection in response to directive: close -
(HttpMethodBase.java:1008)
*


Re: Solrj - when a request fails

2011-09-23 Thread Walter Closenfleight
I tried that with the same results. You would think I would get the
exception back from Solr so I could trap it, instead I lose all other
requests after it.

On Fri, Sep 23, 2011 at 8:33 AM, Gunther, Andrew  wrote:

> All the solr methods look like they should throw those 2 exceptions.
> Have you tried the DirectXmlRequest method?
>
> up.process(solrServer);
>
>  public UpdateResponse process( SolrServer server ) throws
> SolrServerException, IOException
>  {
>long startTime = System.currentTimeMillis();
>UpdateResponse res = new UpdateResponse();
>res.setResponse( server.request( this ) );
>res.setElapsedTime( System.currentTimeMillis()-startTime );
>return res;
>  }
> ____
> From: Walter Closenfleight [walter.p.closenflei...@gmail.com]
> Sent: Friday, September 23, 2011 8:55 AM
> To: solr-user@lucene.apache.org
> Subject: Solrj - when a request fails
>
> *
>  I have a java program which sends thousands of Solr XML files up to Solr
> using the following code. It works fine until there is a problem with one
> of
> the Solr XML files. The code fails on the solrServer.request(up) line, but
> it does not throw an exception, my application therefore cannot catch it
> and
> recover, and my whole application dies.
>
> I've fixed this individual file that made it fail, but want to better trap
> these so my application does not die.
>
> Thanks for any insight you can provide. Java code and log below-
>
>
> // ... start of a loop to process each file removed ...
>
> try {
>
>   String xml = read(filename);
>   DirectXmlRequest up = new DirectXmlRequest( "/update", xml );
>
>   solrServer.request( up );
>   solrServer.commit();
>
> } catch (SolrServerException e) {
>   log.warn("Exception: "+ e.toString());
>   throw new MyException(e);
> } catch (IOException e) {
>   log.warn("Exception: "+ e.toString());
>   throw new MyException(e);
> }
> DEBUG >> "[\n]" - (Wire.java:70)
> DEBUG Request body sent - (EntityEnclosingMethod.java:508)
> DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
> DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
> DEBUG << "Server: Apache-Coyote/1.1[\r][\n]" - (Wire.java:70)
> DEBUG << "Content-Type: text/html;charset=utf-8[\r][\n]" - (Wire.java:70)
> DEBUG << "Content-Length: 1271[\r][\n]" - (Wire.java:70)
> DEBUG << "Date: Fri, 23 Sep 2011 12:08:05 GMT[\r][\n]" - (Wire.java:70)
> DEBUG << "Connection: close[\r][\n]" - (Wire.java:70)
> DEBUG << "[\r][\n]" - (Wire.java:70)
> DEBUG << "Apache Tomcat/6.0.29 - Error
> report<!--H1
>
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
> H2
>
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
> H3
>
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
> BODY
> {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
> P
>
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
> {color : black;}A.name {color : black;}HR {color : #525D76;}-->
> HTTP Status 400 - Unexpected character 'x' (code 120) in
> prolog; expected '<'[\n]" - (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]: [3,1] noshade="noshade">type Status reportmessage
> Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
> (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]:
> [3,1]description
> " - (Wire.java:84)
> DEBUG << "The request sent by the client was syntactically incorrect
> (Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
> (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]: [3,1]). noshade="noshade">Apache Tomcat/6.0.29" -
> (Wire.java:84)
> DEBUG Should close connection in response to directive: close -
> (HttpMethodBase.java:1008)
> *
>