Re: Debugging custom RequestHander: spinning up a core for debugging

2017-12-22 Thread Tod Olson
Thanks, that pointed me in the right direction! The problem was an ancient ICU 
library in the distributed code.

-Tod

On Dec 15, 2017, at 5:15 PM, Erick Erickson 
<erickerick...@gmail.com<mailto:erickerick...@gmail.com>> wrote:

My guess is this isn't a Solr issue at all; you are somehow using an old Java.

RBBIDataWrapper is from

com.ibm.icu.text;

I saw on a quick Google that this was cured by re-installing Eclipse,
but that was from 5 years ago.

You say your Java and IDE skills are a bit rusty, maybe you haven't
updated your Java JDK or Eclipse in a while? I don't know if Eclipse
somehow has its own Java (I haven't used Eclipse for quite a while).

I take it this runs outside Eclipse OK? (well, with problems otherwise
you wouldn't be stepping through it.)

Best,
Erick

On Fri, Dec 15, 2017 at 1:16 PM, Tod Olson 
<t...@uchicago.edu<mailto:t...@uchicago.edu>> wrote:
Hi everyone,

I need to do some step-wise debugging on a custom RequestHandler. I'm trying to 
spin up a core in a Junit test, with the idea of running it inside of Eclipse 
for debugging. (If there's an easier way, I'd like to see a walk through!) 
Problem is the core fails to spin up with:

java.io.IOException: Break Iterator Rule Data Magic Number Incorrect, or 
unsupported data version

Here's the code, just trying to load (cribbed and adapted from 
https://stackoverflow.com/questions/45506381/how-to-debug-solr-plugin):

public class BrowseHandlerTest
{
   private static CoreContainer container;
   private static SolrCore core;

   private static final Logger logger = Logger.getGlobal();



   @BeforeClass
   public static void prepareClass() throws Exception
   {
   String solrHomeProp = "solr.solr.home";
   System.out.println(solrHomeProp + "= " + 
System.getProperty(solrHomeProp));
   // create the core container from the solr.solr.home system property
   container = new CoreContainer();
   container.load();
   core = container.getCore("biblio");
   
logger.info<http://logger.info/><http://logger.info<http://logger.info/>>("Solr 
core loaded!");
   }

   @AfterClass
   public static void cleanUpClass()
   {
   core.close();
   container.shutdown();
   
logger.info<http://logger.info/><http://logger.info<http://logger.info/>>("Solr 
core shut down!");
   }
}

The test, run through ant, fails as follows:

   [junit] solr.solr.home= /Users/tod/src/vufind/solr/vufind
   [junit] SLF4J: Defaulting to no-operation (NOP) logger implementation
   [junit] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
   [junit] SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
   [junit] SLF4J: Defaulting to no-operation MDCAdapter implementation.
   [junit] SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for 
further details.
   [junit] Tests run: 0, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
1.299 sec
   [junit]
   [junit] - Standard Error -
   [junit] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   [junit] SLF4J: Defaulting to no-operation (NOP) logger implementation
   [junit] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
   [junit] SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
   [junit] SLF4J: Defaulting to no-operation MDCAdapter implementation.
   [junit] SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for 
further details.
   [junit] -  ---
   [junit] Testcase: org.vufind.solr.handler.tests.BrowseHandlerTest: Caused an 
ERROR
   [junit] SolrCore 'biblio' is not available due to init failure: JVM Error 
creating core [biblio]: null
   [junit] org.apache.solr.common.SolrException: SolrCore 'biblio' is not 
available due to init failure: JVM Error creating core [biblio]: null
   [junit]  at 
org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1066)
   [junit]  at 
org.vufind.solr.handler.tests.BrowseHandlerTest.prepareClass(BrowseHandlerTest.java:45)
   [junit] Caused by: org.apache.solr.common.SolrException: JVM Error creating 
core [biblio]: null
   [junit]  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:833)
   [junit]  at 
org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87)
   [junit]  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467)
   [junit]  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458)
   [junit]  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit]  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
   [junit]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   [junit]  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE

Debugging custom RequestHander: spinning up a core for debugging

2017-12-15 Thread Tod Olson
Hi everyone,

I need to do some step-wise debugging on a custom RequestHandler. I'm trying to 
spin up a core in a Junit test, with the idea of running it inside of Eclipse 
for debugging. (If there's an easier way, I'd like to see a walk through!) 
Problem is the core fails to spin up with:

java.io.IOException: Break Iterator Rule Data Magic Number Incorrect, or 
unsupported data version

Here's the code, just trying to load (cribbed and adapted from 
https://stackoverflow.com/questions/45506381/how-to-debug-solr-plugin):

public class BrowseHandlerTest
{
private static CoreContainer container;
private static SolrCore core;

private static final Logger logger = Logger.getGlobal();



@BeforeClass
public static void prepareClass() throws Exception
{
String solrHomeProp = "solr.solr.home";
System.out.println(solrHomeProp + "= " + 
System.getProperty(solrHomeProp));
// create the core container from the solr.solr.home system property
container = new CoreContainer();
container.load();
core = container.getCore("biblio");
logger.info<http://logger.info>("Solr core loaded!");
}

@AfterClass
public static void cleanUpClass()
{
core.close();
container.shutdown();
logger.info<http://logger.info>("Solr core shut down!");
}
}

The test, run through ant, fails as follows:

[junit] solr.solr.home= /Users/tod/src/vufind/solr/vufind
[junit] SLF4J: Defaulting to no-operation (NOP) logger implementation
[junit] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
[junit] SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
[junit] SLF4J: Defaulting to no-operation MDCAdapter implementation.
[junit] SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for 
further details.
[junit] Tests run: 0, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
1.299 sec
[junit]
[junit] - Standard Error -
[junit] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[junit] SLF4J: Defaulting to no-operation (NOP) logger implementation
[junit] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
[junit] SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
[junit] SLF4J: Defaulting to no-operation MDCAdapter implementation.
[junit] SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for 
further details.
[junit] -  ---
[junit] Testcase: org.vufind.solr.handler.tests.BrowseHandlerTest: Caused 
an ERROR
[junit] SolrCore 'biblio' is not available due to init failure: JVM Error 
creating core [biblio]: null
[junit] org.apache.solr.common.SolrException: SolrCore 'biblio' is not 
available due to init failure: JVM Error creating core [biblio]: null
[junit]  at 
org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1066)
[junit]  at 
org.vufind.solr.handler.tests.BrowseHandlerTest.prepareClass(BrowseHandlerTest.java:45)
[junit] Caused by: org.apache.solr.common.SolrException: JVM Error creating 
core [biblio]: null
[junit]  at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:833)
[junit]  at 
org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87)
[junit]  at 
org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467)
[junit]  at 
org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458)
[junit]  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[junit]  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
[junit]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[junit]  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[junit]  at java.lang.Thread.run(Thread.java:745)
[junit] Caused by: java.lang.ExceptionInInitializerError
[junit]  at 
org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory.inform(ICUTokenizerFactory.java:107)
[junit]  at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:721)
[junit]  at org.apache.solr.schema.IndexSchema.(IndexSchema.java:160)
[junit]  at 
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:56)
[junit]  at 
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:70)
[junit]  at 
org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:108)
[junit]  at 
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:79)
[junit]  at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
[junit] Caused by: java.lang.RuntimeException: java.io.IOExceptio

Re: Compile problems with anonymous SimpleCollector in custom request handler

2017-11-30 Thread Tod Olson
Shawn,

Thanks for the response! Yes, that was it, an older version unexpectedly in the 
classpath.

And for the benefit of anyone who searches the list archive with a similar 
debugging need, it's pretty easy to print out the classpath from ant's 
build.xml:

  






  
  
  
  Classpath: ${classpathProp}


-Tod

On Nov 29, 2017, at 6:00 PM, Shawn Heisey 
<apa...@elyograg.org<mailto:apa...@elyograg.org>> wrote:

On 11/29/2017 2:27 PM, Tod Olson wrote:
I'm modifying a existing custom request handler for an open source project, and 
am looking for some help with a compile error around an anonymous 
SimpleCollector. The build failure message from ant and the source of the 
specific method are below. I am compiling on a Mac with Java 1.8 and Solr 
6.4.2. There are two things I do not understand.

First:
   [javac] 
/Users/tod/src/vufind-browse-handler/browse-handler/java/org/vufind/solr/handler/BrowseRequestHandler.java:445:
 error:  is not abstract and does 
not override abstract method setNextReader(AtomicReaderContext) in Collector
   [javac] db.search(q, new SimpleCollector() {

Based on the javadoc, neither SimpleCollector nor Collector define a 
setNextReader(AtomicReaderContext) method. Grepping through the Lucene 6.4.2 
source reveals neither a setNextReader method (though maybe a couple archaic 
comments), nor an AtomicReaderContext class or interface.



Second:
   [javac] method IndexSearcher.search(Query,Collector) is not applicable
   [javac]   (argument mismatch;  cannot be 
converted to Collector)

How is it that SimpleCollector cannot be converted to Collector? Perhaps this 
is just a consequence of the first error.

For the first error:  What version of Solr/Lucene are you compiling
against?  I have found that Collector *did* have a setNextReader method
up through Lucene 4.10.4, but in 5.0, that method was gone.  I suspect
that what's causing your first problem is that you have older Lucene
jars (4.x or earlier) on your classpath, in addition to a newer version
that you actually want to use for the compile.

I think that can also explain the second problem.  It looks like
SimpleCollector didn't exist in Lucene 4.10, which is the last version
where Collector had setNextReader.  SimpleCollector is mentioned in the
javadoc for Collector as of 5.0, though.

Thanks,
Shawn





Compile problems with anonymous SimpleCollector in custom request handler

2017-11-29 Thread Tod Olson
Hi everyone,

I'm modifying a existing custom request handler for an open source project, and 
am looking for some help with a compile error around an anonymous 
SimpleCollector. The build failure message from ant and the source of the 
specific method are below. I am compiling on a Mac with Java 1.8 and Solr 
6.4.2. There are two things I do not understand.

First:
[javac] 
/Users/tod/src/vufind-browse-handler/browse-handler/java/org/vufind/solr/handler/BrowseRequestHandler.java:445:
 error:  is not abstract and does 
not override abstract method setNextReader(AtomicReaderContext) in Collector
[javac] db.search(q, new SimpleCollector() {

Based on the javadoc, neither SimpleCollector nor Collector define a 
setNextReader(AtomicReaderContext) method. Grepping through the Lucene 6.4.2 
source reveals neither a setNextReader method (though maybe a couple archaic 
comments), nor an AtomicReaderContext class or interface.

Second:
[javac] method IndexSearcher.search(Query,Collector) is not applicable
[javac]   (argument mismatch;  cannot be 
converted to Collector)

How is it that SimpleCollector cannot be converted to Collector? Perhaps this 
is just a consequence of the first error.

Any help getting past this compile problem would be most welcome!

-Tod




Build failure message:

build-handler:
[mkdir] Created dir: 
/Users/tod/src/vufind-browse-handler/build/browse-handler
[javac] Compiling 1 source file to 
/Users/tod/src/vufind-browse-handler/build/browse-handler
[javac] 
/Users/tod/src/vufind-browse-handler/browse-handler/java/org/vufind/solr/handler/BrowseRequestHandler.java:445:
 error:  is not abstract and does 
not override abstract method setNextReader(AtomicReaderContext) in Collector
[javac] db.search(q, new SimpleCollector() {
[javac]^
[javac] 
/Users/tod/src/vufind-browse-handler/browse-handler/java/org/vufind/solr/handler/BrowseRequestHandler.java:445:
 error: no suitable method found for search(TermQuery,)
[javac] db.search(q, new SimpleCollector() {
[javac]   ^
[javac] method IndexSearcher.search(Query,int) is not applicable
[javac]   (argument mismatch;  cannot be 
converted to int)
[javac] method IndexSearcher.search(Query,Filter,int) is not applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method IndexSearcher.search(Query,Filter,Collector) is not 
applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method IndexSearcher.search(Query,Collector) is not applicable
[javac]   (argument mismatch;  cannot be 
converted to Collector)
[javac] method IndexSearcher.search(Query,Filter,int,Sort) is not 
applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method 
IndexSearcher.search(Query,Filter,int,Sort,boolean,boolean) is not applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method IndexSearcher.search(Query,int,Sort) is not applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method IndexSearcher.search(Weight,ScoreDoc,int) is not 
applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method 
IndexSearcher.search(List,Weight,ScoreDoc,int) is not 
applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method IndexSearcher.search(Weight,int,Sort,boolean,boolean) is 
not applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method 
IndexSearcher.search(Weight,FieldDoc,int,Sort,boolean,boolean,boolean) is not 
applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method 
IndexSearcher.search(List,Weight,FieldDoc,int,Sort,boolean,boolean,boolean)
 is not applicable
[javac]   (actual and formal argument lists differ in length)
[javac] method 
IndexSearcher.search(List,Weight,Collector) is not 
applicable
[javac]   (actual and formal argument lists differ in length)
[javac] 2 errors


Problem method:

/**
 *
 * Function to retrieve the doc ids when there is a building limit
 * This retrieves the doc ids for an individual heading
 *
 * Need to add a filter query to limit the results from Solr
 *
 * Includes functionality to retrieve additional info
 * like titles for call numbers, possibly ISBNs
 *
 * @param headingstring of the heading to use for finding matching
 * @param fields docs colon-separated string of Solr fields
 *   to return for use in the browse display
 * @param maxBibListSize maximum numbers of records to check for fields
 * @return return a map of Solr ids and extra bib info
 */
public Map

Upgrading Tika in place

2013-02-05 Thread Tod
I'm running an older version of Solr - 3.4.0.2011.09.09.09.06.17.  It 
seems the version of Tika that came with it has trouble with some PDF 
files and newer Office documents.  I've checked the latest Tika release 
and it solves these problems.


I'd like to just drop in the necessary Tika jars without needing to 
rebuild or upgrade Solr.  Is that a possibility and if so how would I go 
about accomplishing it?  I see tika-core and tika-parsers in the 3.6.2 
Solr build distro, is that the only two files I need?



Thanks - Tod


Solr 3.6 parsing and extraction files

2012-04-18 Thread Tod
Could someone possibly provide me with a list of jars that I need to 
extract from the apache-solr-3.6.0.tgz file to enable the parsing and 
remote streaming of office style documents?  I assume (for a multicore 
configuration) they would go into ./tomcat/webapps/solr/WEB-INF/lib - 
correct?



Thanks - Tod


Indexing Using XML Message

2012-01-25 Thread Tod
I have a local data store containing a host of different document types. 
 This data store is separate from a remote Solr install making 
streaming not an option.  Instead I'd like to generate an XML file that 
contains all of the documents including content and metadata.


What would be the most appropriate way to accomplish this?  I could use 
the Tika CLI to generate XML but I'm not sure it would work or that its 
the most efficient way to handle things.  Can anyone offer some suggestions?



Thanks - Tod


Re: Help! - ContentStreamUpdateRequest

2011-11-16 Thread Tod

Erick,

Autocommit is commented out in solrconfig.xml.  I have avoided them 
until after the indexing process is complete.  As an experiment I tried 
committing every n records processed to see if varying n would make a 
difference, it really didn't change much.


My original use case had the client running from the Solr server and 
streaming the document content over from a web server based on the URL 
gathered by a query from a backend database.  The locking problem 
appeared there first so I tried moving the client code to the web server 
to be closer the the documents origin.  That helped a little but ended 
up locking which is where I am now.


Solr should be able to index way more documents than the 35K I'm trying 
to index.  It seems from other's accounts they are able to do what I'm 
trying to do successfully.  Therefore I believe I must be doing 
something extraordinarily dumb.  I'll be happy to share any information 
about my environment or configuration if it will help find my error.


Thanks for all of your help.


- Tod





On 11/15/2011 8:08 PM, Erick Erickson wrote:

That's odd. What are your autocommit parameters? And are you either
committing or optimizing as part of your program? I'd bump the
autocommit parameters up and NOT commit (or optimize) from your
client if you are

Best
Erick

On Tue, Nov 15, 2011 at 2:17 PM, Todlistac...@gmail.com  wrote:

Otis,

The files are only part of the payload.  The supporting metadata exists in a
database.  I'm pulling that information, as well as the name and location of
the file, from the database and then sending it to a remote Solr instance to
be indexed.

I've heard Solr would prefer to get documents it needs to index in chunks
rather than one at a time as I'm doing now.  The one at a time approach is
locking up the Solr server at around 700 entries.  My thought was if I could
chunk them in a batch at a time the lockup will stop and indexing
performance would improve.


Thanks - Tod

On 11/15/2011 12:13 PM, Otis Gospodnetic wrote:


Hi,

How about just concatenating your files into one? �Would that work for
you?

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




From: Todlistac...@gmail.com
To: solr-user@lucene.apache.org
Sent: Monday, November 14, 2011 4:24 PM
Subject: Help! - ContentStreamUpdateRequest

Could someone take a look at this page:

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

... and tell me what code changes I would need to make to be able to
stream a LOT of files at once rather than just one?� It has to be something
simple like a collection of some sort but I just can't get it figured out.�
Maybe I'm using the wrong class altogether?


TIA












Re: Help! - ContentStreamUpdateRequest

2011-11-15 Thread Tod

Otis,

The files are only part of the payload.  The supporting metadata exists 
in a database.  I'm pulling that information, as well as the name and 
location of the file, from the database and then sending it to a remote 
Solr instance to be indexed.


I've heard Solr would prefer to get documents it needs to index in 
chunks rather than one at a time as I'm doing now.  The one at a time 
approach is locking up the Solr server at around 700 entries.  My 
thought was if I could chunk them in a batch at a time the lockup will 
stop and indexing performance would improve.



Thanks - Tod

On 11/15/2011 12:13 PM, Otis Gospodnetic wrote:

Hi,

How about just concatenating your files into one? �Would that work for you?

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




From: Todlistac...@gmail.com
To: solr-user@lucene.apache.org
Sent: Monday, November 14, 2011 4:24 PM
Subject: Help! - ContentStreamUpdateRequest

Could someone take a look at this page:

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

... and tell me what code changes I would need to make to be able to stream a 
LOT of files at once rather than just one?� It has to be something simple like 
a collection of some sort but I just can't get it figured out.� Maybe I'm using 
the wrong class altogether?


TIA







Help! - ContentStreamUpdateRequest

2011-11-14 Thread Tod

Could someone take a look at this page:

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

... and tell me what code changes I would need to make to be able to 
stream a LOT of files at once rather than just one?  It has to be 
something simple like a collection of some sort but I just can't get it 
figured out.  Maybe I'm using the wrong class altogether?



TIA


Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod
This is a code fragment of how I am doing a ContentStreamUpdateRequest 
using CommonHTTPSolrServer:



  ContentStreamBase.URLStream csbu = new ContentStreamBase.URLStream(url);
  InputStream is = csbu.getStream();
  FastInputStream fis = new FastInputStream(is);

  csur.addContentStream(csbu);
  csur.setParam(literal.content_id,00);
  csur.setParam(literal.contentitle,This is a test);
  csur.setParam(literal.title,This is a test);
  server.request(csur);
  server.commit();

  fis.close();


This works fine for one document (a pdf in this case).  When I surround 
this with a while loop and try adding multiple documents I get:


org.apache.solr.client.solrj.SolrServerException: java.io.IOException: 
stream is closed


I've tried commenting out the fis.close, and also using just a plain 
InputStream with and without a .close() call - neither work.  Is there a 
way to do this that I'm missing?



Thanks - Tod


Re: Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod

Answering my own question.

ContentStreamUpdateRequest (csur) needs to be within the while loop not 
outside as I had it.  Still not seeing any dramatic performance 
improvements over perl though (the point of this exercise).  Indexing 
locks after about 30-45 minutes of activity, even a commit won't budge it.




On 11/04/2011 12:36 PM, Tod wrote:

This is a code fragment of how I am doing a ContentStreamUpdateRequest
using CommonHTTPSolrServer:


ContentStreamBase.URLStream csbu = new ContentStreamBase.URLStream(url);
InputStream is = csbu.getStream();
FastInputStream fis = new FastInputStream(is);

csur.addContentStream(csbu);
csur.setParam(literal.content_id,00);
csur.setParam(literal.contentitle,This is a test);
csur.setParam(literal.title,This is a test);
server.request(csur);
server.commit();

fis.close();


This works fine for one document (a pdf in this case). When I surround
this with a while loop and try adding multiple documents I get:

org.apache.solr.client.solrj.SolrServerException: java.io.IOException:
stream is closed

I've tried commenting out the fis.close, and also using just a plain
InputStream with and without a .close() call - neither work. Is there a
way to do this that I'm missing?


Thanks - Tod




can solr follow and index hyperlinks embedded in rich text documents (pdf, doc, etc)?

2011-10-21 Thread Tod
I have a feeling the answer is no since you wouldn't want to start 
indexing a large volume of office documents containing hyperlinks that 
could lead all over the internet.  But, since there might be a use case 
like a customer just asked me if it could be done?, I thought I would 
make sure.



Thanks - Tod


Re: java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

2011-10-21 Thread Tod

On 10/19/2011 2:58 PM, wrote:

Hi Tod,

I had similar issue with slf4j, but it was NoClassDefFound. Do you
have some other dependencies in your application that use some other
version of slf4j? You can use mvn dependency:tree to get all
dependencies in your application. Or maybe there's some other version
already in your tomcat or application server.

/Tim


I had to start over from scratch but I believe that's exactly what it 
was.  Things are working now.


Thanks.


java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

2011-10-19 Thread Tod
I'm working on upgrading to Solr 3.4.0 and am seeing this error in my 
tomcat log.  I'm using the following slf jars:


slf4j-api-1.6.1.jar
slf4j-jdk14-1.6.1.jar

Has anybody run into this?  I can reproduce it doing curl calls to the 
Solr ExtractingRequestHandler ala /solr/update/extract.


TIA - Tod


Re: Instructions for Multiple Server Webapps Configuring with JNDI

2011-10-18 Thread Tod

On 10/14/2011 2:44 PM, Chris Hostetter wrote:


: modified the solr/home accordingly.  I have an empty directory under
: tomcat/webapps named after the solr home directory in the context fragment.

if that empty directory has the same base name as your context fragment
(ie: tomcat/webapps/solr0 and solr0.xml) that may give you problems
... the entire point of using context fragment files is to define webapps
independently of a simple directory based hierarchy in tomcat/webapps ...
if you have a directory there with the same name you create a conflict --
which webapp should it use, the empty one, or the one specified by your
contextt file?



Looks like that was the problem, once I removed the ./webapps/solr0 
directory and started tomcat back up it was recreated correctly.





: I expected to fire up tomcat and have it unpack the war file contents into the
: solr home directory specified in the context fragment, but its empty, as is
: the webapps directory.

that's not what the solr/home env variable is for at all.  tomcat will
put the unpacked war where ever it needs/wants to (in theory it could just
load it in memory) ... the point of the solr/home env variable is for you
to tell the solr.war where to find the configuration files for this
context.


Sorry, my mistake.  I wasn't referring to solr/home I was referring 
literally to the new solr home under tomcat - in this instance 
./webapps/solr0.


One more question, is there a particular advantage of multiple solr 
instances vs. multiple solr cores?



Thanks.


Please help - Solr Cell using 'stream.url'

2011-10-07 Thread Tod
I'm batching documents into solr using solr cell with the 'stream.url' 
parameter.  Everything is working fine until I get to about 5k documents 
in and then it starts issuing 'read timeout 500' errors on every document.


The sysadmin says there's plenty of CPU, memory, and no paging so it 
doesn't look like the OS is the problem.  I can curl the documents that 
Solr is trying to index and failing just fine so it seems to be a Solr 
issue.  There's only about 35K documents total so Solr should even blink.


Can anyone help me diagnose this problem?  I'd be happy to provide any 
more detail that is needed.



Thanks - Tod


Solr read timeout

2011-08-18 Thread Tod
I'm using perl to indirectly call the solr ExtractingRequestHandler to 
stream remote documents into a solr index instance.  Every 100 URL's I 
process I do a commit.  I've got about 30K documents to be indexed.  I'm 
using a stock, out of the box version of solr 1.4.1 with the necessary 
schema changes for the fields I'm indexing.


I seem to be running into performance problems about 40 documents in.  I 
start getting Failed: 500 read timeouts that last about 4 minutes each 
slowing processing down to a crawl.  I've tried a later version of tika 
(0.8) and that didn't seem to help.  I'm also not sure it's the problem.


Given I'm using a pretty much unaltered version of Solr could it be my 
problem?  I'm running everything under a typical Tomcat install on a 
Linux VM.  I understand there are performance tweaks I can make to the 
Solr config but would like to focus them first on resolving this problem 
rather than blanket tweaking the entire config.


Is there anything in particular I should look at?  Can I provide any 
more information?



Thanks - Tod


Most current tik jar files that work with Solr 1.4.1

2011-08-17 Thread Tod
What is the latest version of Tika that I can use with Solr 1.4.1?  it 
comes packaged with 0.4.  I tried 0.8 and it no workie.


Re: ContentStreamLoader Problem

2011-07-13 Thread Tod

On 07/12/2011 6:52 PM, Erick Erickson wrote:

This is a shot in the dark, but this smells like a classpath issue,
and since you have
a 1.4.1 installation on the machine, I'm *guessing* that you're getting a mix of
old and new Jars. What happens if you try this on a machine that doesn't have
1.4.1 on it? If that works, then it's likely a classpath issue

Best
Erick


I'll give it a shot and report back.


Thanks - Tod


ContentStreamLoader Problem

2011-07-12 Thread Tod
I'm getting this error testing Solr V3.3.0 using the 
ExtractingRequestHandler.  I'm taking advantage of the REST interface 
and batching my documents in using stream.url.   It happens for every 
document I try to index.  It works fine under Solr 1.4.1.


I'm running everything under Tomcat.  I already have an existing 1.4.1 
instance running, could that be causing the problem?



Thanks - Tod




Jul 12, 2011 1:11:31 PM 
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {} 0 1
Jul 12, 2011 1:11:31 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.AbstractMethodError: 
org/apache/solr/handler/ContentStreamLoader.load(Lorg/apache/solr/request/SolrQueryRequest;Lorg/apache/solr/response/SolrQueryResponse;Lorg/apache/solr/common/util/ContentStream;)V
	at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
	at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
	at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
	at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
	at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
	at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
	at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
	at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
	at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)

at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:811)



tika.parser.AutoDetectParser

2011-07-01 Thread Tod
I'm working on upgrading to v3.2 from v 1.4.1.  I think I've got 
everything working but when I try to do a data import using 
dataimport.jsp I'm rolling back and getting class not found exception on 
the above referenced class.


I thought that tika was packaged up with the base Solr build now but 
this message seems to contradict that unless I'm missing a jar 
somewhere.  I've got both dataimporthandler jar files in my WEB-INF/lib 
dir so not sure what I could be missing.  Any ideas?



Thanks - Tod


Re: tika.parser.AutoDetectParser

2011-07-01 Thread Tod

On 07/01/2011 12:59 PM, Shawn Heisey wrote:

On 7/1/2011 9:23 AM, Tod wrote:

I'm working on upgrading to v3.2 from v 1.4.1. I think I've got
everything working but when I try to do a data import using
dataimport.jsp I'm rolling back and getting class not found exception
on the above referenced class.

I thought that tika was packaged up with the base Solr build now but
this message seems to contradict that unless I'm missing a jar
somewhere. I've got both dataimporthandler jar files in my WEB-INF/lib
dir so not sure what I could be missing. Any ideas?


Tika is included in the solr download, but it's not included in the .war
or any of the other files in the dist directory. You may have noticed
that you now have to include one or more jars for the dataimport
handler. If you copy the following files from the solr download to the
same place you have apache-solr-dataimporthandler-3.2.0.jar, you should
be OK.

contrib/extraction/lib/tika-core-0.8.jar
contrib/extraction/lib/tika-parsers-0.8.jar

Thanks,
Shawn





Got them, thanks Shawn.


Re: Default schema - 'keywords' not multivalued

2011-06-29 Thread Tod

On 06/28/2011 12:04 PM, Chris Hostetter wrote:


: I'm streaming over the document content (presumably via tika) and its
: gathering the document's metadata which includes the keywords metadata field.
: Since I'm also passing that field from the DB to the REST call as a list (as
: you suggested) there is a collision because the keywords field is single
: valued.
:
: I can change this behavior using a copy field.  What I wanted to know is if
: there was a specific reason the default schema defined a field like keywords
: single valued so I could make sure I wasn't missing something before I changed
: things.

That file is just an example, you're absolutely free to change it to meet
your use case.

I'm not very familiar with Tika, but based on the comment in the example
config...

!-- Common metadata fields, named specifically to match up with
  SolrCell metadata when parsing rich documents such as Word, PDF.
  Some fields are multiValued only because Tika currently may return
  multiple values for them.
--

...i suspect it was intentional that that field is *not* multiValued (i
guess Tika always returns a single delimited value?) but if you have
multiple descrete values you want to send for your DB backed data there is
no downside to changing that.

: While I'm at it, I'd REALLY like to know how to use DIH to index the metadata
: from the database while simultaneously streaming over the document content and
: indexing it.  I've never quite figured it out yet but I have to believe it is
: a possibility.

There's a TikaEntityProcessor that can be used to have Tika crunch the
data that comes from an entity and extract out specific fields, and it
can be used in combination with a JdbcDataSource and a BinFileDataSource
so that a field in your db data specifies the name of a file on disk to
use as the TikaEntity -- but i've personally never tried it

Here's a simple example someone posted last year that they got working...

http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html



-Hoss



Thanks Hoss, I'll just change the schema then.

The problem with TikaEntityProcessor is this installation is still 
running v1.4.1 so I'll need to upgrade.


Any short and sweet instructions for upgrading to 3.2?  I have a pretty 
straight forward Tomcat install, would just dropping in the new war suffice?



- Tod


Re: Default schema - 'keywords' not multivalued

2011-06-28 Thread Tod

On 06/27/2011 11:23 AM, lee carroll wrote:

Hi Tod,
A list of keywords would be fine in a non multi valued field:

keywords : xxx yyy sss aaa 

multi value field would allow you to repeat the field when indexing

keywords: xxx
keywords: yyy
keywords: sss
etc



Thanks Lee. the problem is I'm manually pushing a document (via 
stream.url) and its metadata from a database with the Solr 
/update/extract REST service, HTTP GET, using Perl.


I'm streaming over the document content (presumably via tika) and its 
gathering the document's metadata which includes the keywords metadata 
field.  Since I'm also passing that field from the DB to the REST call 
as a list (as you suggested) there is a collision because the keywords 
field is single valued.


I can change this behavior using a copy field.  What I wanted to know is 
if there was a specific reason the default schema defined a field like 
keywords single valued so I could make sure I wasn't missing something 
before I changed things.


While I'm at it, I'd REALLY like to know how to use DIH to index the 
metadata from the database while simultaneously streaming over the 
document content and indexing it.  I've never quite figured it out yet 
but I have to believe it is a possibility.



- Tod


Default schema - 'keywords' not multivalued

2011-06-27 Thread Tod
This was a little curious to me and I wondered what the thought process 
was behind it before I decide to change it.



Thanks - Tod


Tika Jax-RS and DIH

2011-06-22 Thread Tod

Mattmann, Chris A (388J chris.a.mattmann at jpl.nasa.gov writes:



 Hi Jo,

 You may consider checking out Tika trunk, where we recently have a Tika JAX-RS

web service [1] committed as

 part of the tika-server module. You could probably wire DIH into it and

accomplish the same thing.


 Cheers,
 Chris

 [1] https://issues.apache.org/jira/browse/TIKA-593



Chris - could you elaborate on using Tika Jax-RS and DIH?  How 
production ready is it?  Could you summarize the steps necessary to get 
it to work?  Any examples yet?


I'd be happy to work with you to get something out to the group.


Thanks - Tod


Indexing Mediawiki

2011-06-07 Thread Tod
I have a need to index an internal instance of Mediawiki.  I'd like to 
use DIH if I can since I have access to the database but the example 
provided on the Solr wiki uses a Mediawiki dump XML file.


Does anyone have any experience using DIH in this manner?  Am I barking 
up the wrong tree and would be better off dumping and indexing the wiki 
instead?




Thanks - Tod


Can ExtractingRequestHandler ignore documents metadata

2011-05-09 Thread Tod
I'm indexing content from a CMS' database of metadata.  The client would 
prefer that Solr exclude the properties (metadata) of any documents 
being indexed.  Is there a way to tell Tika to only index a document's 
text and not its properties?


Thanks - Tod


Opensearch Format Support

2011-01-20 Thread Tod
Does Solr support the Opensearch format?  If so could someone point me 
to the correct documentation?



Thanks - Tod


Re: Retrieving indexed content containing multiple languages

2010-11-16 Thread Tod

On 11/11/2010 3:24 PM, Dennis Gearon wrote:

I look forward to the eanswers to this one.


Well, it seems it was as easy as adding the CJKTokenizerFactory:

fieldtype name=text_cjk class=solr.TextField 
positionIncrementGap=100

 analyzer
  tokenizer class=solr.CJKTokenizerFactory/
 /analyzer
/fieldtype


Once I did that and reindexed I could search for both english and 
chinese using the default 'text' field.  The next hurdle was getting the 
javascript to cooperate.  The chinese characters were getting corrupted 
on the way to the AJAX call against the Solr server.


As it turned out I was performing a POST to Solr using the jQuery .ajax 
api call.  Apparently when executing a POST you need to make sure the 
characters entered into the input field of the form are converted to 
unicode (\u7968 for example) prior to the AJAX call to Solr. 
Conversely, if executing a GET you need to convert the characters to 
UTF8 (%E7%A5%A8).


So now my customers are happily finding the appropriate document using 
english and chinese.


If someone could check my math I would appreciate it.  If it looks 
reasonable and there is nothing else written about it on the wiki I'll 
create a tutorial to give everybody else a leg up.



- Tod




- Original Message 
From: Todlistac...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, November 11, 2010 11:35:23 AM
Subject: Retrieving indexed content containing multiple languages

My Solr corpus is currently created by indexing metadata from a relational
database as well as content pointed to by URLs from the database.  I'm using a
pretty generic out of the box Solr schema.  The search results are presented via
an AJAX enabled HTML page.

When I perform a search the document title (for example) has a mix of english
and chinese characters.  Everything there is fine - I can see the english and
chinese returned from a facet query on title.  I can search against the title
using english words it contains and I get back an expected result.  I asked a
chinese friend to perform the same search using chinese and nothing is returned.

How should I go about getting this search to work?  Chinese is just one
language, I'll probably need to support more in the future.

My thought is that the chinese characters are indexed as their unicode
equivalent so all I'll need to do is make sure the query is encoded
appropriately and just perform a regular search as I would if the terms were in
english.  For some reason that sounds too easy.

I see there is a CJK tokenizer that would help here.  Do I need that for my
situation?  Is there a fairly detailed tutorial on how to handle these types of
language challenges?


Thanks in advance - Tod






Re: Any Copy Field Caveats?

2010-11-11 Thread Tod

I've noticed that using camelCase in field names causes problems.


On 11/5/2010 11:02 AM, Will Milspec wrote:

Hi all,

we're moving from an old lucene version to solr  and plan to use the Copy
Field functionality. Previously we had rolled our own implementation,
sticking title, description, etc. in a field called 'content'.

We lose some flexibility (i.e. java layer can no longer control what gets in
the new copied field), at the expense of simplicity. A fair tradeoff IMO.

My question: has anyone found any subtle issues or gotchas with copy
fields?

(from the subject line caveat--pronounced 'kah-VEY-AT'  is Latin as in
Caveat Emptor...let the buyer beware).

thanks,

will

will





Retrieving indexed content containing multiple languages

2010-11-11 Thread Tod
My Solr corpus is currently created by indexing metadata from a 
relational database as well as content pointed to by URLs from the 
database.  I'm using a pretty generic out of the box Solr schema.  The 
search results are presented via an AJAX enabled HTML page.


When I perform a search the document title (for example) has a mix of 
english and chinese characters.  Everything there is fine - I can see 
the english and chinese returned from a facet query on title.  I can 
search against the title using english words it contains and I get back 
an expected result.  I asked a chinese friend to perform the same search 
using chinese and nothing is returned.


How should I go about getting this search to work?  Chinese is just one 
language, I'll probably need to support more in the future.


My thought is that the chinese characters are indexed as their unicode 
equivalent so all I'll need to do is make sure the query is encoded 
appropriately and just perform a regular search as I would if the terms 
were in english.  For some reason that sounds too easy.


I see there is a CJK tokenizer that would help here.  Do I need that for 
my situation?  Is there a fairly detailed tutorial on how to handle 
these types of language challenges?



Thanks in advance - Tod


Chinese characters - a little OT

2010-11-10 Thread Tod

Sorry, OT but its driving me nuts.

I've indexed a document with chinese characters in its title.  When I 
perform the search (that returns json) I get back the title and using 
Javascript place it into a variable that ultimately ends up as a 
dropdown of titles to choose from.  The problem is the title contains 
the literal unicode representation of the chinese characters (#20013; 
for example).


Here's the javascript:

 var optionObj=document.createElement('option');

 menuItem=titleArray[1].title;
 menuVal=titleArray[1].url;

 if((menuItem !=  )(menuItem != )(menuItem != null))
  {
   optionObj.appendChild(document.createTextNode(menuItem));
   optionObj.setAttribute('id',optId + optCnt);
   optionObj.setAttribute('target',_blank);
   optionObj.setAttribute('value',menuVal);
   optCnt++;
   selectObj.appendChild(optionObj);
  }

My hunch is I should utf-8 encode the title and then try and display the 
result but its nor working.  I still am seeing the unicode characters.


Does anyone see what I could be doing wrong?

TIA - Tod


Re: Phrase Query Problem?

2010-11-02 Thread Tod

On 11/1/2010 11:14 PM, Ken Stanley wrote:

On Mon, Nov 1, 2010 at 10:26 PM, Todlistac...@gmail.com  wrote:


I have a number of fields I need to do an exact match on.  I've defined
them as 'string' in my schema.xml.  I've noticed that I get back query
results that don't have all of the words I'm using to search with.

For example:


q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json

Should, with an exact match, return only one entry but it returns five some
of which don't have any of the fields I've specified.  I've tried this both
with and without quotes.

What could I be doing wrong?


Thanks - Tod




Tod,

Without knowing your exact field definition, my first guess would be your
first boolean query; because it is not quoted, what SOLR typically does is
to transform that type of query into something like (assuming your uniqueKey
is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do
(mykeywords:Compliance+With+Conduct+Standards) you might see different
(better?) results. Otherwise, appenddebugQuery=on to your URL and you can
see exactly how SOLR is parsing your query. If none of that helps, what is
your field definition in your schema.xml?

- Ken



The field definition is:

field name=mykeywords type=string indexed=true stored=true 
multiValued=true/


The request:

select?q=(((mykeywords:Compliance+With+Attorney+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))fl=mykeywordsstart=0indent=truewt=jsondebugQuery=on

The response looks like this:

 responseHeader:{
  status:0,
  QTime:8,
  params:{
wt:json,
q:(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL))),

start:0,
indent:true,
fl:mykeywords,
debugQuery:on}},
 response:{numFound:6,start:0,docs:[
{
 mykeywords:[Compliance With Attorney Conduct Standards]},
{
 mykeywords:[Anti-Bribery,Bribes]},
{
 mykeywords:[Marketing Guidelines,Marketing]},
{},
{
 mykeywords:[Anti-Bribery,Due Diligence]},
{
 mykeywords:[Anti-Bribery,AntiBribery]}]
 },
 debug:{
  rawquerystring:(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
  querystring:(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
  parsedquery:(mykeywords:Compliance text:attorney text:conduct 
text:standard) mykeywords:All mykeywords:ALL,
  parsedquery_toString:(mykeywords:Compliance text:attorney 
text:conduct text:standard) mykeywords:All mykeywords:ALL,

  explain:{
...

As you mentioned, looking at the parsed query its breaking the request 
up on word boundaries rather than on the entire phrase.  The goal is to 
return only the very first entry.  Any ideas?



Thanks - Tod


Re: Phrase Query Problem?

2010-11-02 Thread Tod

On 11/2/2010 9:21 AM, Ken Stanley wrote:

On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote:


That's not the response I get when I try your query, so I suspect
something's not quite right with your test...

But you could also try putting parentheses around the words, like
mykeywords:(Compliance+With+Conduct+Standards)

Best
Erick



I agree with Erick, your query string showed quotes, but your parsed query
did not. Using quotes, or parenthesis, would pretty much leave your query
alone. There is one exception that I've found: if you use a stopword
analyzer, any stop words would be converted to ? in the parsed query. So if
you absolutely need every single word to match, regardless, you cannot use a
field type that uses the stop word analyzer.

For example, I have two dynamic field definitions: df_text_* that does the
default text transformations (including stop words), and df_text_exact_*
that does nothing (field type is string). When I run the
query df_text_exact_company_name:Bank of America OR
df_text_company_name:Bank of America, the following is shown as my
query/parsed query when debugQuery is on:

str name=rawquerystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=querystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=parsedquery
df_text_exact_company_name:Bank of America
PhraseQuery(df_text_company_name:bank ? america)
/str
str name=parsedquery_toString
df_text_exact_company_name:Bank of America df_text_company_name:bank ?
america
/str

The difference is subtle, but important. If I were to do
df_text_company_name:Bank and America, I would still match Bank of
America. These are things that you should keep in mind when you are
creating fields for your indices.

A useful tool for seeing what SOLR does to your query terms is the Analysis
tool found in the admin panel. You can do an analysis on either a specific
field, or by a field type, and you will see a breakdown by Analyzer for
either the index, query, or both of any query that you put in. This would
definitely be useful when trying to determine why SOLR might return what it
does.

- Ken



What it turned out to be was escaping the spaces.

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

became

q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

If I tried

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

... it didn't work.  Once I removed the quotes and escaped spaces it 
worked as expected.  This seems odd since I would have expected the 
quotes to have triggered a phrase query.


Thanks for your help.

- Tod


Facet count of zero

2010-11-01 Thread Tod
I'm trying to exclude certain facet results from a facet query.  It 
seems to work but rather than being excluded from the facet list its 
returned with a count of zero.


Ex: 
q=(-foo:bar)facet=truefacet.field=foofacet.sort=idxwt=jsonindent=true


This returns bar with a count of zero.  All the other foo's show up with 
valid counts.


Can I do this?  Is my syntax incorrect?



Thanks - Tod


Re: Facet count of zero

2010-11-01 Thread Tod

On 11/1/2010 1:03 PM, Yonik Seeley wrote:

On Mon, Nov 1, 2010 at 12:55 PM, Todlistac...@gmail.com  wrote:

I'm trying to exclude certain facet results from a facet query. �It seems to
work but rather than being excluded from the facet list its returned with a
count of zero.


If you don't want to see 0 counts, use facet.mincount=1

http://wiki.apache.org/solr/SimpleFacetParameters

-Yonik
http://www.lucidimagination.co



Ex:
q=(-foo:bar)facet=truefacet.field=foofacet.sort=idxwt=jsonindent=true

This returns bar with a count of zero. �All the other foo's show up with
valid counts.

Can I do this? �Is my syntax incorrect?



Thanks - Tod





Excellent, I completely missed it - thanks!


Phrase Query Problem?

2010-11-01 Thread Tod
I have a number of fields I need to do an exact match on.  I've defined 
them as 'string' in my schema.xml.  I've noticed that I get back query 
results that don't have all of the words I'm using to search with.


For example:

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json

Should, with an exact match, return only one entry but it returns five 
some of which don't have any of the fields I've specified.  I've tried 
this both with and without quotes.


What could I be doing wrong?


Thanks - Tod



Overriding Tika's field processing

2010-10-28 Thread Tod
I'm reading my document data from a CMS and indexing it using calls to 
curl.  The curl call includes 'stream.url' so Tika will also index the 
actual document pointed to by the CMS' stored url.  This works fine.


Presentation side I have a dropdown with the title of all the indexed 
documents such that when a user clicks one of them it opens in a new 
window.  Using js, I've been parsing the json returned from Solr to 
create the dropdown.  The problem is I can't get the titles sorted 
alphabetically.


If I use a facet.sort on the title field I get back ALL the sorted 
titles in the facet block, but that doesn't include the associated 
URL's.  A sorted query won't work because title is a multivalued field.


The one option I can think of is to make the title single valued so that 
I have a one to one relationship to the returned url.  To do that I'd 
need to be able to *not* index the Tika returned values.


If I read right, my understanding was that I could use 'literal.title' 
in the curl call to limit what would be included in the index from Tika. 
 That doesn't seem to be working as a test facet query returns more 
than I have in the CMS.


Am I understanding the 'literal.title' processing correctly?  Does 
anybody have experience/suggestions on how to handle this?



Thanks - Tod



Re: UpdateXmlMessage

2010-10-04 Thread Tod

On 10/1/2010 11:33 PM, Lance Norskog wrote:

Yes. stream.file and stream.url are independent of the request handler.
They do their magic at the very top level of the request.

However, there are no unit tests for these features, but they are widely
used.



Sorry Lance, are you agreeing that I can't or that I can?  If I can, I'm 
doing something wrong.  I'm specifying stream.url as its own field in 
the XML like:


add
 doc
  field name=authorI am the author/field
  field name=titleI am the title/field
  field name=stream.urlhttp://www.test.com/myOfficeDoc.doc/field
  .
  .
  .
 /doc
/add

The wiki docs were a little sparse on this one.

- Tod





Tod wrote:

I can do this using GET:

http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery%3Eoffice:Bridgewater%3C/query%3E%3C/delete%3E

http://localhost:8983/solr/update?stream.body=%3Ccommit/%3E

... but can I pass a stream.url parameter using an UpdateXmlMessage? I
looked at the schema and I think the answer is no but just wanted to
check.


TIA






UpdateXmlMessage

2010-10-01 Thread Tod

I can do this using GET:

http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery%3Eoffice:Bridgewater%3C/query%3E%3C/delete%3E
http://localhost:8983/solr/update?stream.body=%3Ccommit/%3E

... but can I pass a stream.url parameter using an UpdateXmlMessage?  I 
looked at the schema and I think the answer is no but just wanted to check.



TIA


Re: Solrj ContentStreamUpdateRequest Slow

2010-08-19 Thread Tod

On 8/19/2010 1:45 AM, Lance Norskog wrote:

'stream.url' is just a simple parameter. You should be able to just
add it directly.



I agree (code excluding imports):

public class CommonTest {

  public static void main(String[] args) {
System.out.println(main...);
try {
  String fileName = String fileName = 
http://remoteserver/test/test.pdf;;

  String solrId = 1234;
  indexFilesSolrCell(fileName, solrId);

} catch (Exception ex) {
  ex.printStackTrace();
}
  }

  /**
   * Method to index all types of files into Solr.
   * @param fileName
   * @param solrId
   * @throws IOException
   * @throws SolrServerException
   */
  public static void indexFilesSolrCell(String fileName, String solrId)
throws IOException, SolrServerException {

System.out.println(indexFilesSolrCell...);

String urlString = http://localhost:9080/solr;;

System.out.println(getting connection...);
SolrServer solr = new CommonsHttpSolrServer(urlString);

System.out.println(getting updaterequest handle...);
ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest(/update/extract);


System.out.println(setting params...);
req.setParam(stream.url, fileName);
req.setParam(literal.content_id, solrId);

System.out.println(making request...);
solr.request(req);

System.out.println(committing...);
solr.commit();

System.out.println(done...);
  }
}


At making request I get:

java.lang.NullPointerException
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:381)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)

at CommonTest.indexFilesSolrCell(CommonTest.java:59)
at CommonTest.main(CommonTest.java:26)

... which is pointing to the solr.request(req) line.



Thanks - Tod


Re: Solrj ContentStreamUpdateRequest Slow

2010-08-18 Thread Tod

On 8/16/2010 6:12 PM, Chris Hostetter wrote:

:  I think your problem may be that StreamingUpdateSolrServer buffers up
:  commands and sends them in batches in a background thread.  if you want to
:  send individual updates in real time (and time them) you should just use
:  CommonsHttpSolrServer
: 
: My goal is to batch updates.  My content lives somewhere else so I was trying

: to find a way to tell Solr where the document lived so it could go out and
: stream it into the index for me.  That's where I thought
: StreamingUpdateSolrServer would help.

If your content lives on a machine which is not your client nor your 
server and you want your client to tell your server to go fetch it 
directly then the stream.url param is what you need -- that is unrelated 
to wether you use StreamingUpdateSolrServer or not.



Do you happen to have a code fragment laying around that demonstrates 
using CommonsHttpSolrServer and stream.url?  I've tried it in 
conjunction with ContentStreamUpdateRequest and I keep getting an 
annoying null pointer exception.  In the meantime I will check the 
examples...




Thinking about it some more, i suspect the reason you might be seeing a 
delay when using StreamingUpdateSolrServer is because of this bug...


   https://issues.apache.org/jira/browse/SOLR-1990

...if there are no actual documents in your UpdateRequest (because you are 
using the stream.url param) then the StreamingUpdateSolrServer blocks 
until all other requests are done, then delegates to the super class (so 
it never actaully puts your indexing requests in a buffered queue, it just 
delays and then does them immediately)


Not sure of a good way arround this off the top of my head, but i'll note 
it in SOLR-1990 as another problematic use case that needs dealt with.


Perhaps I can execute an initial update request using a benign file 
before making the stream.url call?


Also, to beat a dead horse, this:
'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdfstream.contentType=application/pdfliteral.content_id=12342commit=true'

... works fine - I just want to do it a LOT and as efficiently as 
possible.  If I have to I can wrap it in a perl script and run a cURL or 
LWP loop but I'd prefer to use SolrJ if I can.


Thanks for all your help.


- Tod


Re: Solrj ContentStreamUpdateRequest Slow

2010-08-13 Thread Tod

On 8/12/2010 8:02 PM, Chris Hostetter wrote:

: It returns in around a second.  When I execute the attached code it takes just
: over three minutes.  The optimal for me would be able get closer to the
: performance I'm seeing with curl using Solrj.

I think your problem may be that StreamingUpdateSolrServer buffers up 
commands and sends them in batches in a background thread.  if you want to 
send individual updates in real time (and time them) you should just use 
CommonsHttpSolrServer



-Hoss



My goal is to batch updates.  My content lives somewhere else so I was 
trying to find a way to tell Solr where the document lived so it could 
go out and stream it into the index for me.  That's where I thought 
StreamingUpdateSolrServer would help.


- Tod


Re: Solrj ContentStreamUpdateRequest Slow

2010-08-06 Thread Tod

On 8/4/2010 11:11 PM, jayendra patil wrote:

ContentStreamUpdateRequest seems to read the file contents and transfer it
over http, which slows down the indexing.

Try Using StreamingUpdateSolrServer with stream.file param @
http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post

e.g.

SolrServer server = new StreamingUpdateSolrServer(Solr Server URL,20,8);
UpdateRequest req = new UpdateRequest(/update/extract);
ModifiableSolrParams params = null ;
params = new ModifiableSolrParams();
params.add(stream.file, new String[]{local file path});
params.set(literal.id, value);
req.setParams(params);
server.request(req);
server.commit();


Thanks for your suggestions.  Unfortunately, I'm still seeing poor 
performance.


To be clear, I am trying to have SOLR index multiple documents that 
exist on a remote server.  I'd prefer that SOLR stream the documents 
after I pass a pointer to them rather than me retrieving and pushing 
them so I can avoid network overhead.


When I do this:

curl 
'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdfstream.contentType=application/pdfliteral.content_id=12342commit=true'


It returns in around a second.  When I execute the attached code it 
takes just over three minutes.  The optimal for me would be able get 
closer to the performance I'm seeing with curl using Solrj.


To be fair the SOLR server I am using is really a workstation class 
machine, plus I am still learning.  I have a feeling I'm doing something 
dumb but just can't seem to pinpoint the exact problem.



Thanks - Tod


code---


import java.io.File;
import java.io.IOException;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.request.UpdateRequest;
import org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer;
import org.apache.solr.common.params.ModifiableSolrParams;


/**
 * @author EDaniel
 */
public class SolrExampleTests {

  public static void main(String[] args) {
System.out.println(main...);
try {
//  String fileName = /test/test.pdf;
  String fileName = http://remoteserver/test/test.pdf;;
  String solrId = 1234;
  indexFilesSolrCell(fileName, solrId);

} catch (Exception ex) {
  System.out.println(ex.toString());
}
  }

  /**
   * Method to index all types of files into Solr.
   * @param fileName
   * @param solrId
   * @throws IOException
   * @throws SolrServerException
   */
  public static void indexFilesSolrCell(String fileName, String solrId)
throws IOException, SolrServerException {

System.out.println(indexFilesSolrCell...);

String urlString = http://localhost:8080/solr;;

System.out.println(getting connection...);
//SolrServer solr = new CommonsHttpSolrServer(urlString);
SolrServer solr = new StreamingUpdateSolrServer(urlString,100,5);

System.out.println(getting updaterequest handle...);
//ContentStreamUpdateRequest up = new 
ContentStreamUpdateRequest(/update/extract);

UpdateRequest up = new UpdateRequest(/update/extract);

ModifiableSolrParams params = null ;
params = new ModifiableSolrParams();
//params.add(stream.file, fileName);
params.add(stream.url, fileName);
params.set(literal.content_id, solrId);
up.setParams(params);

System.out.println(making request...);
solr.request(up);

System.out.println(committing...);
solr.commit();

System.out.println(done...);
  }
}


Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread Tod
I'm running a slight variation of the example code referenced below and 
it takes a real long time to finally execute.  In fact it hangs for a 
long time at solr.request(up) before finally executing.  Is there 
anything I can look at or tweak to improve performance?


I am also indexing a local pdf file, there are no firewall issues, solr 
is running on the same machine, and I tried the actual host name in 
addition to localhost but nothing helps.



Thanks - Tod

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample


Supplementing already indexed data

2010-07-11 Thread Tod
I'm getting metadata from a RDB but the actual content is stored 
somewhere else.  I'd like to index the content too but I don't want to 
overlay the already indexed metadata.  I know this can be done but I 
just can't seem to dig up the correct docs, can anyone point me in the 
right direction?



Thanks.


Re: Data Import Handler Rich Format Documents

2010-07-06 Thread Tod

On 6/28/2010 8:28 AM, Alexey Serba wrote:

Ok, I'm trying to integrate the TikaEntityProcessor as suggested. �I'm using
Solr Version: 1.4.0 and getting the following error:

java.lang.ClassNotFoundException: Unable to load BinURLDataSource or
org.apache.solr.handler.dataimport.BinURLDataSource

It seems that DIH-Tika integration is not a part of Solr 1.4.0/1.4.1
release. You should use trunk / nightly builds.
https://issues.apache.org/jira/browse/SOLR-1583



Thanks, that would explain things - I'm using a stock 1.4.0 download.



My data-config.xml looks like this:

dataConfig
�dataSource type=JdbcDataSource
� �driver=oracle.jdbc.driver.OracleDriver
� �url=jdbc:oracle:thin:@whatever:12345:whatever
� �user=me
� �name=ds-db
� �password=secret/

�dataSource type=BinURLDataSource
� �name=ds-url/

�document
� �entity name=my_database
� � dataSource=ds-db
� � query=select * from my_database where rownum lt;=2
� � �field column=CONTENT_ID � � � � � � � �name=content_id/
� � �field column=CMS_TITLE � � � � � � � � name=cms_title/
� � �field column=FORM_TITLE � � � � � � � �name=form_title/
� � �field column=FILE_SIZE � � � � � � � � name=file_size/
� � �field column=KEYWORDS � � � � � � � � �name=keywords/
� � �field column=DESCRIPTION � � � � � � � name=description/
� � �field column=CONTENT_URL � � � � � � � name=content_url/
� �/entity

� �entity name=my_database_url
� � dataSource=ds-url
� � query=select CONTENT_URL from my_database where
content_id='${my_database.CONTENT_ID}'
� � entity processor=TikaEntityProcessor
� � �dataSource=ds-url
� � �format=text
� � �url=http://www.mysite.com/${my_database.content_url};
� � �field column=text/
� � /entity
� �/entity

�/document
/dataConfig

I added the entity name=my_database_url section to an existing (working)
database entity to be able to have Tika index the content pointed to by the
content_url.

Is there anything obviously wrong with what I've tried so far?


I think you should move Tika entity into my_database entity and
simplify the whole configuration

entity name=my_database dataSource=ds-db query=select * from
my_database where rownum lt;=2
...
field column=CONTENT_URL   name=content_url/

entity processor=TikaEntityProcessor dataSource=ds-url
format=text url=http://www.mysite.com/${my_database.content_url};
field column=text/
/entity
/entity



This, I guess, would be after I checked out and built from trunk?


Thanks - Tod


Indexing Rich Format Documents using Data Import Handler (DIH) and the TikaEntityProcessor

2010-06-23 Thread Tod

Please refer to this thread for history:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3c4c1b6bb6.7010...@gmail.com%3e


I'm trying to integrate the TikaEntityProcessor as suggested.  I'm using 
Solr Version: 1.4.0 and getting the following error:


java.lang.ClassNotFoundException: Unable to load BinURLDataSource or 
org.apache.solr.handler.dataimport.BinURLDataSource


curl -s http://test.html|curl 
http://localhost:9080/solr/update/extract?extractOnly=true --data-binary 
@-  -H 'Content-type:text/html'


... works fine so presumably my Tika processor is working.


My data-config.xml looks like this:

dataConfig
  dataSource type=JdbcDataSource
driver=oracle.jdbc.driver.OracleDriver
url=jdbc:oracle:thin:@whatever:12345:whatever
user=me
name=ds-db
password=secret/

  dataSource type=BinURLDataSource
name=ds-url/

  document
entity name=my_database
 dataSource=ds-db
 query=select * from my_database where rownum lt;=2
  field column=CONTENT_IDname=content_id/
  field column=CMS_TITLE name=cms_title/
  field column=FORM_TITLEname=form_title/
  field column=FILE_SIZE name=file_size/
  field column=KEYWORDS  name=keywords/
  field column=DESCRIPTION   name=description/
  field column=CONTENT_URL   name=content_url/
/entity

entity name=my_database_url
 dataSource=ds-url
 query=select CONTENT_URL from my_database where 
content_id='${my_database.CONTENT_ID}'

 entity processor=TikaEntityProcessor
  dataSource=ds-url
  format=text
  url=http://www.mysite.com/${my_database.content_url};
  field column=text/
 /entity
/entity

  /document
/dataConfig

I added the entity name=my_database_url section to an existing 
(working) database entity to be able to have Tika index the content 
pointed to by the content_url.


Is there anything obviously wrong with what I've tried so far because 
this is not working, it keeps rolling back with the error above.



Thanks - Tod


Data Import Handler Rich Format Documents

2010-06-18 Thread Tod
I have a database containing Metadata from a content management system. 
 Part of that data includes a URL pointing to the actual published 
document which can be an HTML file or a PDF, MS Word/Excel/Powerpoint, etc.


I'm already indexing the Metadata and that provides a lot of value.  The 
customer however would like that the content pointed to by the URL also 
be indexed for more discrete searching.


This article at Lucid:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS

describes the process of coding a custom transformer.  A separate 
article I've read implies Nutch could be used to provide this 
functionality too.


What would be the best and most efficient way to accomplish what I'm 
trying to do?  I have a feeling the Lucid article might be dated and 
there might ways to do this now without any coding and maybe without 
even needing to use Nutch.  I'm using the current release version of Solr.


Thanks in advance.


- Tod


Re: Data Import Handler Rich Format Documents

2010-06-18 Thread Tod

On 6/18/2010 9:12 AM, Otis Gospodnetic wrote:

Tod,

You didn't mention Tika, which makes me think you are not aware of it...
You could implement a custom Transformer that uses Tika to perform rich doc 
text extraction, just like ExtractingRequestHandler does it (see 
http://wiki.apache.org/solr/ExtractingRequestHandler ).  Maybe you could even 
just call ERH from your Transformer, though that wouldn't be the most efficient.



You're right, sorry.  I have looked at Tika, which I believe is used by 
Nutch too - no?


Implementing a transformer is fine.  I guess I'm being lazy and trying 
to see if a method of doing this has been incorporated into the latest 
Solr release so I can avoid coding for it.








- Original Message 

From: Tod listac...@gmail.com
To: solr-user@lucene.apache.org
Sent: Fri, June 18, 2010 8:51:02 AM
Subject: Data Import Handler Rich Format Documents

I have a database containing Metadata from a content management system.  
Part of that data includes a URL pointing to the actual published document which 
can be an HTML file or a PDF, MS Word/Excel/Powerpoint, etc.


I'm already 
indexing the Metadata and that provides a lot of value.  The customer 
however would like that the content pointed to by the URL also be indexed for 
more discrete searching.


This article at Lucid:


href=http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS; 
target=_blank 

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS


describes 
the process of coding a custom transformer.  A separate article I've read 
implies Nutch could be used to provide this functionality too.


What would 
be the best and most efficient way to accomplish what I'm trying to do?  I 
have a feeling the Lucid article might be dated and there might ways to do this 
now without any coding and maybe without even needing to use Nutch.  I'm 
using the current release version of Solr.


Thanks in 

advance.



- Tod





Re: Data Import Handler Rich Format Documents

2010-06-18 Thread Tod

On 6/18/2010 11:24 AM, Otis Gospodnetic wrote:

Tod,

I don't think DIH can do that, but who knows, let's see what others say.
Yes, Nutch uses TIKA, too.

 Otis


Looks like the ExtractingRequestHandler uses Tika as well.  I might just 
use this but I'm wondering if there will be a large performance 
difference between using it to batch content in over rolling my own 
Transformer?



- Tod



Re: Data Import Handler Rich Format Documents

2010-06-18 Thread Tod

On 6/18/2010 2:42 PM, Chris Hostetter wrote:

:  I don't think DIH can do that, but who knows, let's see what others say.

: Looks like the ExtractingRequestHandler uses Tika as well.  I might just use
: this but I'm wondering if there will be a large performance difference between
: using it to batch content in over rolling my own Transformer?

I'm confused ... You're using DIH, and some of your fields are URLs to 
documents that you want to parse with Tika?


Why would you need a custom Transformer?


I started this thread after reading a Lucid article suggesting a custom 
Transformer might be the way to go when using DIH.  My initial question 
was if there was an alternative.


My database contains only Metadata and a reference to the actual content 
(HTML,Office Documents, PDF...) as a URL - not blobs as the Lucid 
article focused on.  What I would like to do is use DIH somehow to index 
the Metadata but also the actual content pointed to by the URL column.


I might be able to do this instead with Nutch (who uses Tika), haven't 
thoroughly researched this yet, or I can write a job to pull all the 
URL's out of the database and utilize cURL and the Solr 
ExtractingRequestHandler to push everything into the index.  I just 
wanted to see what everybody else is doing and what my other options 
might be.



Thanks - Tod


Ref:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS


Re: JSON formatted response from SOLR question....

2010-05-11 Thread Tod

Jon,

Yes!!!

rsp.facet_counts.facet_fields.['var'].length to
rsp.facet_counts.facet_fields[var].length and voila.

Tripped up on a syntax error, how special.  Just needed another set of 
eyes - thanks.  VelocityResponse duly noted, it will come in handy later.



- Tod

On 5/10/2010 4:55 PM, Jon Baer wrote:

IIRC, I think what we ended up doing in a project was to use the 
VelocityResponseWriter to write the JSON and set the echoParams to read the 
handler setup (and looping through the variables).

In the template you can grab it w/ something like 
$request.params.get(facet_fields) ... I don't remember the exact hack here 
but basically you should also be able to do something like:

rsp.facet_counts.facet_fields['var'].length

In the end w/ some of the nice stuff from the Velocity tools .jar it was easier 
to work w/ the layout needed for plugins.

- Jon

On May 10, 2010, at 10:18 AM, Tod wrote:


I apologize, this is such a JSON/javascript question but I'm stuck and am not 
finding any resources that address this specifically.

I'm doing a faceted search and getting back in my facet_counts.faceted_fields 
response an array of countries.  I'm gathering the count of the array elements 
returned using this notation:

rsp.facet_counts.facet_fields.country.length

... where rsp is the eval'ed JSON response from SOLR.  From there I just loop 
through listing the individual country with its associated count.

The problem I am having is trying to automate this to loop through any one of a 
number of facets contained in my JSON response, not just country.  So instead 
of the above I would have something like:

rsp.facet_counts.facet_fields.VARIABLE.length

... where VARIABLE would be the name of one of the facets passed into a 
javascript function to perform the loop.  None of the javascript examples I can 
find seems to address this.  Has anyone run into this? Is there a better list 
to ask this question?


Thanks in advance.







JSON formatted response from SOLR question....

2010-05-10 Thread Tod
I apologize, this is such a JSON/javascript question but I'm stuck and 
am not finding any resources that address this specifically.


I'm doing a faceted search and getting back in my 
facet_counts.faceted_fields response an array of countries.  I'm 
gathering the count of the array elements returned using this notation:


rsp.facet_counts.facet_fields.country.length

... where rsp is the eval'ed JSON response from SOLR.  From there I just 
loop through listing the individual country with its associated count.


The problem I am having is trying to automate this to loop through any 
one of a number of facets contained in my JSON response, not just 
country.  So instead of the above I would have something like:


rsp.facet_counts.facet_fields.VARIABLE.length

... where VARIABLE would be the name of one of the facets passed into a 
javascript function to perform the loop.  None of the javascript 
examples I can find seems to address this.  Has anyone run into this? 
Is there a better list to ask this question?



Thanks in advance.