DIH delta-import question

2010-10-15 Thread Bernd Fehling
Dear list,

I'm trying to delta-import with datasource FileDataSource and
processor FileListEntityProcessor. I want to load only files
which are newer than dataimport.properties - last_index_time.
It looks like that newerThan=${dataimport.last_index_time} is
without any function.

Can it be that newerThan is configured under FileListEntityProcessor
but used for the next following entity processor and not for
FileListEntityProcessor itself?

This is in my case the XPathEntityProcessor which doesn't support
newerThan.
Version is solr 4.0 from trunk.

Regards,
Bernd


SolrJ API for multi core?

2010-10-15 Thread Tharindu Mathew
Hi,

Is $subject available??

Or do I need to make HTTP Get calls?


-- 
Regards,

Tharindu


Re: JVM GC troubles

2010-10-15 Thread accid
Hi,

I dont run totally OOM (no OOM exceptions in the log) but I constantly
garbage collect. While not collecting, SOLR master handels the updates
pretty well.

Every insert is unique, so I dont have any deletes or optimizes and all
queries are handled by the single slave instance. Is there a way to reduce
the objects held in the old gen space ? It looks like the JVM is trying to
hold as many objects as possible in the cache, to provide fast queries, who
are not needed in my situation.

Regarding the Jboss ... well as I said, its the minimalistic version of it
and we use it due to automation process within our departement. In my
test-env I tried it with a plain tomcat 6.x but without any improvements, so
the Jboss overhead is minimal to nothing.

The JVM parameters I wrote, are the ones I am struggling with at the moment.
I was hoping someone will come up with a hint regarding the solarconfig.xml
itself.

PS: if anyone is questioning the implemented architecture (master - slave,
configs, schema, etc.)  ... its our architects fault and I have to operate
it ;-)

2010/10/15 Otis Gospodnetic otis_gospodne...@yahoo.com

 Hello,

 I hope you are not running JBoss just to run Solr - there are simpler
 containers
 out there, e.g., Jetty.
 Do you OOM?
 Do things look better if you replicate less often (e.g. every 5 minutes
 instead
 of every 60 seconds)?
 Do all/some of those -X__ JVM params actually help?

 Otis
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: accid ac...@gmx.net
  To: solr-user@lucene.apache.org
  Sent: Thu, October 14, 2010 1:25:34 PM
  Subject: Re: JVM GC troubles
 
  I forgot a few important details:
 
  solr version = 1.4.1
  current index  size = 50gb
  growth ~600mb / day
  jboss runs with web settings (same as  minimal)
  2010/10/14 ac...@gmx.net
 
Hi,
  
   as I am new here, I want to say hello and thanks in advance  for your
 help.
  
  
   HW Setup:
  
   1x SOLR Master  - Sun Microsystems SUN FIRE X4450 - 4 x 2,93ghz, 64gb
 ram
   1x SOLR Slave  -  Sun Microsystems SUN FIRE X4450 - 4 x 2,93ghz, 64gb
 ram
  
SW Setup:
  
   Solaris 10 Generic_142901-03
   jboss  5.1.0
   JDK 1.6 update 18
  
  
   # Specify the exact Java  VM executable to use.
   #
JAVA=/opt/appsrv/java6/bin/amd64/java
  
   #
   # Specify  options to pass to the Java VM.
   #
   JAVA_OPTS=-server -Xms6144m  -Xmx6144m -Xmn3072m
 -XX:ThreadStackSize=1024
   -XX:MaxPermSize=512m  -Dorg.jboss.resolver.warning=true
-Dsun.rmi.dgc.client.gcInterval=360
-Dsun.rmi.dgc.server.gcInterval=360
 -Dnetworkaddress.cache.ttl=1800
-XX:+UseConcMarkSweepGC
  
  
   SOLR Setup:
  
   #)  the master has to deal an avg. update rate of 50 updates/s and
 peaks of
400 updates/s
  
   #) the slave replicates every 60s using the built  in solr replication
   method (NOT rsync)
  
   #) the slave  querys are ~20/sec
  
  
   #) schema.xml
  
  
field name=myname1 type=string indexed=true stored=false
required=true/
   field name=myname2 type=int indexed=true  stored=true
   required=true/
   field name=myname3  type=int indexed=true stored=true
   required=true/
field name=myname4 type=long indexed=true stored=true
required=true/
   field name=myname5 type=int indexed=true  stored=true
   required=true/
   field name=myname6  type=string indexed=true stored=true
   required=true/
field name=myname7 type=string indexed=true  stored=false/
   field name=myname8 type=string  indexed=true stored=false/
   field name=myname9  type=string indexed=true stored=false/
   field  name=myname10 type=long indexed=true stored=false/
   field  name=myname11 type=int indexed=true stored=false/
   field  name=myname12 type=string indexed=true stored=false/
field name=myname13 type=tdate indexed=true  stored=false/
   field name=myname14 type=int indexed=true  stored=false
   multiValued=true/
   field name=myname15  type=string indexed=true stored=false
multiValued=true/
   field name=myname16 type=int  indexed=true stored=false
   multiValued=true/
   field  name=myname17 type=string indexed=true stored=false
multiValued=true/
   field name=myname18 type=string  indexed=true stored=false
   multiValued=true/
   field  name=myname19 type=string indexed=true stored=false
multiValued=true/
   field name=myname20 type=boolean  indexed=true stored=false/
   field name=myname21 type=int  indexed=true stored=false
   required=true/
   field  name=myname22 type=date indexed=true stored=true
   default=NOW  multiValued=false/
  
  
   #) The solarconfig.xml is  attached
  
  
  
   Both, master  slave suffer from  serious performance impacts during
 garbage
collects
  
  
   I obviously have an GC problem, because ~30min  after startup, the Old
 space
   is full and not beeing freed  up.
  
   Below you find a JMX copypaste of the Heap AFTER a  garbage collect!!
 As
   you can see, even 

How do you programatically create new cores?

2010-10-15 Thread Tharindu Mathew
Hi everyone,

I'm a newbie at this and I can't figure out how to do this after going
through http://wiki.apache.org/solr/CoreAdmin?

Any sample code would help a lot.

Thanks in advance.

-- 
Regards,

Tharindu


Re: SOLRJ - Searching text in all fields of a Bean

2010-10-15 Thread Subhash Bhushan
Ahmet,

I got it working to an extent.

Now:
SolrQuery query = new SolrQuery();
query.setQueryType(dismax);
query.setQuery( kitten);
query.setParam(qf, title);


QueryResponse rsp = server.query( query );
ListSOLRTitle beans = rsp.getBeans(SOLRTitle.class);
System.out.println(beans.size());
IteratorSOLRTitle it = beans.iterator();
while(it.hasNext()) {
SOLRTitle solrTitle = (SOLRTitle)it.next();
System.out.println(solrTitle.id);
System.out.println(solrTitle.title);
}

*This code is able to find the record, and prints the ID. But fails to print
the Title.*

Whereas:
SolrQuery query = new SolrQuery();
query.setQuery( title:kitten );

QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();

IteratorSolrDocument iter = rsp.getResults().iterator();

while (iter.hasNext()) {
  SolrDocument resultDoc = iter.next();

  String title = (String) resultDoc.getFieldValue(
title);
  String id = (String) resultDoc.getFieldValue(id); //id is
the uniqueKey field
  System.out.println(id);
  System.out.println(title);
}
*
This query succeeds!*

What am I doing wrong in dismax params? The title field is being fetched as
Null.

Regards,
Subhash Bhushan.


On Fri, Oct 8, 2010 at 2:05 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I have two fields in the bean class, id and title.
  After adding the bean to SOLR, I want to search for, say
  kitten, in all
  defined fields in the bean, like this -- query.setQuery(
  kitten); --
  But I get results only when I affix the bean field name
  before the search
  text like this -- query.setQuery( title:kitten); --
 
  Same case even when I use SolrInputDocument, and add these
  fields.
 
  Can we search text in all fields of a bean, without having
  to specify a
  field?

 With dismax, you can query several fields using different boosts.
 http://wiki.apache.org/solr/DisMaxQParserPlugin







problem on running fullimport

2010-10-15 Thread swapnil dubey
Hi,

I am using the full import option with the data-config file as mentioned
below

dataConfig
   dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql:///xxx user=xxx password=xx  /
document 
entity name=yyy query=select studentName from test1
field column=studentName name=studentName /
/entity
/document
/dataConfig


on running the full-import option I am getting the error mentioned below.I
had already included the dataimport.properties file in my conf file.help me
to get the issue resolved

response
-
lst name=responseHeader
int name=status0/int
int name=QTime334/int
/lst
-
lst name=initArgs
-
lst name=defaults
str name=configdata-config.xml/str
/lst
/lst
str name=commandfull-import/str
str name=modedebug/str
null name=documents/
-
lst name=verbose-output
-
lst name=entity:test1
-
lst name=document#1
str name=queryselect studentName from test1/str
-
str name=EXCEPTION
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: select studentName from test1 Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:184)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.sql.SQLException: Illegal value for setFetchSize().
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
at com.mysql.jdbc.StatementImpl.setFetchSize(StatementImpl.java:2496)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:242)
... 33 more
/str
str name=time-taken0:0:0.50/str
/lst
/lst
/lst
str name=statusidle/str
str name=importResponseConfiguration Re-loaded sucessfully/str
-
lst name=statusMessages
str name=Time Elapsed0:0:0.299/str
str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched0/str
str name=Total Documents Processed0/str
str 

Re: problem on running fullimport

2010-10-15 Thread Ken Stanley
On Fri, Oct 15, 2010 at 7:42 AM, swapnil dubey swapnil.du...@gmail.comwrote:

 Hi,

 I am using the full import option with the data-config file as mentioned
 below

 dataConfig
   dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql:///xxx user=xxx password=xx  /
document 
entity name=yyy query=select studentName from test1
field column=studentName name=studentName /
/entity
/document
 /dataConfig


 on running the full-import option I am getting the error mentioned below.I
 had already included the dataimport.properties file in my conf file.help me
 to get the issue resolved

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime334/int
 /lst
 -
 lst name=initArgs
 -
 lst name=defaults
 str name=configdata-config.xml/str
 /lst
 /lst
 str name=commandfull-import/str
 str name=modedebug/str
 null name=documents/
 -
 lst name=verbose-output
 -
 lst name=entity:test1
 -
 lst name=document#1
 str name=queryselect studentName from test1/str
 -
 str name=EXCEPTION
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query: select studentName from test1 Processing Document # 1
 ...

 --
 Regards
 Swapnil Dubey


Swapnil,

Everything looks fine, except that in your entity definition you forgot to
define which datasource you wish to use. So if you add the
'dataSource=JdbcDataSource' that should get rid of your exception. As a
reminder, the DataImportHandler wiki (
http://wiki.apache.org/solr/DataImportHandler) on Apache's website is very
helpful with learning how to use the DIH properly. It has helped me with
having a printed copy beside me for easy and quick reference.

- Ken


Re: SOLRJ - Searching text in all fields of a Bean

2010-10-15 Thread Subhash Bhushan
Hi Savvas,

Thanks!! Was able to search using copyField/ directive.

I was using the default example schema packaged with solr. I added the
following directive for title field and reindexed data:
*copyField source=title dest=text/*

Regards,
Subhash Bhushan.

On Fri, Oct 8, 2010 at 2:09 PM, Savvas-Andreas Moysidis 
savvas.andreas.moysi...@googlemail.com wrote:

 Hello,

 What does your schema look like? Have you defined a  catch all field and
 copy every value from all your other fields in it with a copyField /
 directive?

 Cheers,
 -- Savvas


 On 8 October 2010 08:30, Subhash Bhushan subhash.bhus...@stratalabs.inwrote:

 Hi,

 I have two fields in the bean class, id and title.
 After adding the bean to SOLR, I want to search for, say kitten, in all
 defined fields in the bean, like this -- query.setQuery( kitten); --
 But I get results only when I affix the bean field name before the search
 text like this -- query.setQuery( title:kitten); --

 Same case even when I use SolrInputDocument, and add these fields.

 Can we search text in all fields of a bean, without having to specify a
 field?
 If we can, what am I missing in my code?

 *Code:*
 Bean:
 ---
 public class SOLRTitle {
 @Field
 public String id = ;
  @Field
 public String title = ;
 }
 ---
 Indexing function:
 ---

 private static void uploadData() {

 try {
 ... // Get Titles
ListSOLRTitle solrTitles = new
 ArrayListSOLRTitle();
 IteratorTitle it = titles.iterator();
 while(it.hasNext()) {
 Title title = (Title) it.next();
 SOLRTitle solrTitle = new SOLRTitle();
 solrTitle.id = title.getID().toString();
 solrTitle.title = title.getTitle();
 solrTitles.add(solrTitle);
 }
 server.addBeans(solrTitles);
 server.commit();
 } catch (SolrServerException e) {
 e.printStackTrace();
 } catch (IOException e) {
 e.printStackTrace();
 }
 }
 ---
 Querying function:
 ---

 private static void queryData() {

 try {
 SolrQuery query = new SolrQuery();
 query.setQuery( kitten);

QueryResponse rsp = server.query( query );
ListSOLRTitle beans = rsp.getBeans(SOLRTitle.class);
System.out.println(beans.size());
IteratorSOLRTitle it = beans.iterator();
while(it.hasNext()) {
 SOLRTitle solrTitle = (SOLRTitle)it.next();
 System.out.println(solrTitle.id);
 System.out.println(solrTitle.title);
}
 } catch (SolrServerException e) {
 e.printStackTrace();
 }
 }
 --

 Subhash Bhushan.





Re: Quick question on indexing an existing index

2010-10-15 Thread Jan Høydahl / Cominvent
Why don't you simply index the source content which you used to build index2 
into index1, i.e. have your tool index to both? You won't save anything on 
trying to extract that content from an existing index. But of course, you COULD 
write yourself a tool which extracts all stored fields for all documents in 
index2, transform it into docs which fit in index1 and then insert them. But 
how will you support deletes etc?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 14. okt. 2010, at 17.06, bbarani wrote:

 
 Hi,
 
 I have a very simple question about indexing an existing index.
 
 We have 2 index, index 1 is being maintained by us (it indexes the data from
 a database) and we have an index 2 which is maintaing by a tool..
 
 Both the schemas are totally different but we are interested to re-index the
 index present in index2 in to index1 such that we will be having just one
 single index (index 1 ) which will contain the data present in both index.
 
 We want to re-index the index present in index 2 using the schema presnt for
 index 1. Also we are interested in customizing the data (something like
 selecting columns / fields from DB using DB import handler).
 
 Thanks,
 BB
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Quick-question-on-indexing-an-existing-index-tp1701663p1701663.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Exception being thrown indexing a specific pdf document using Solr Cell

2010-10-15 Thread Shaun Campbell
I've got an existing Spring Solr SolrJ application that indexes a mixture of
documents.  It seems to have been working fine now for a couple of weeks but
today I've just started getting an exception when processing a certain pdf
file.

The exception is :

ERROR: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@4683c2
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
at
uk.co.sjp.intranet.service.SolrServiceImpl.loadDocuments(SolrServiceImpl.java:308)
at
uk.co.sjp.intranet.SearchController.loadDocuments(SearchController.java:297)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710)
at
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167)
at
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414)
at
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402)
at
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771)
at
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716)
at
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647)
at
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630)
at
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436)
at
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374)
at
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302)
at
org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195)
at
org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159)
at
org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141)
at
org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90)
at
org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
at

Re: Term is duplicated when updating a document

2010-10-15 Thread Israel Ekpo
Which fields are modified when the document is updated/replaced.

Are there any differences in the content of the fields that you are using
for the AutoSuggest.

Have you changed you schema.xml file recently? If you have, then there may
have been changes in the way these fields are analyzed and broken down to
terms.

This may be a bug if you did not change the field or the schema file but the
terms count is changing.

On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer spam_ea...@gmx.net wrote:

 Hi,

 we are updating our documents (that represent products in our shop) when a
 dealer modifies them, by calling
 SolrServer.add(SolrInputDocument) with the updated document.

 My understanding is, that there is no other way of updating an existing
 document.


 However we also use a term query to autocomplete the search field for the
 user, but each time adocument is updated (added) the term count is
 incremented. So after starting with a new index the count is e.g. 1, then
 the document (that contains that term) is updated, and the count is 2, the
 next update will set this to 3 and so on.

 One the index is optimized (by calling SolServer.optimize()) the count is
 correct again.

 Am I missing something or is this a bug in Solr/Lucene?

 Thanks in advance
 Thomas




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Possible to sort by explicit docid order?

2010-10-15 Thread Jan Høydahl / Cominvent
Hi,

In an online bookstore project I'm working on, most frontend widgets are search 
driven. Most often they query with some filters and a sort order, such as 
availabledate desc or simply by score.

However, to allow editorial control, some widgets will display a fixed list of 
books, defined as an ordered list of ISBN numbers inserted by the editor. Based 
on this we do a Solr search to fetch the data to display: 
fq=isbn:(9788200011699 OR 9788200012658 OR ...)

It is important to return the results in the same order as the explicitly given 
list of ISBNs. But I cannot see a way to do that, not even with sort by 
function. So currently we re-order the result list in the frontend.

Would it make sense with an explicit sort order, perhaps implemented as a 
function?

sort=fieldvaluelist(isbn,1000,1,0,$isbnorder) desc, price 
ascisbnorder=9788200011699,9788200012658,9788200013839,9788200014140

The function would be defined as
  
fieldvaluelist(field,startvalue,gap,fallback,field-value[,field-value...])
The output of the example above would be:
  For document with ISBN=9788200011699: 1000
  For document with ISBN=9788200012658: 999
  For document with ISBN=9788200013839: 998
  For document with ISBN not in the list: 0 (fallback - in which case the 
second sort order would kick in)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



Re: Term is duplicated when updating a document

2010-10-15 Thread Thomas Kellerer

Thanks for the answer.


Which fields are modified when the document is updated/replaced.


Only one field was changed, but it was not the one where the auto-suggest term 
is coming from.


Are there any differences in the content of the fields that you are using
for the AutoSuggest.

No


Have you changed you schema.xml file recently? If you have, then there may
have been changes in the way these fields are analyzed and broken down to
terms.


No, I did a complete index rebuild to rule out things like that.
Then after startup, did a search, then updated the document and did a search 
again.

Regards
Thomas
 

This may be a bug if you did not change the field or the schema file but the
terms count is changing.

On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellererspam_ea...@gmx.net  wrote:


Hi,

we are updating our documents (that represent products in our shop) when a
dealer modifies them, by calling
SolrServer.add(SolrInputDocument) with the updated document.

My understanding is, that there is no other way of updating an existing
document.


However we also use a term query to autocomplete the search field for the
user, but each time adocument is updated (added) the term count is
incremented. So after starting with a new index the count is e.g. 1, then
the document (that contains that term) is updated, and the count is 2, the
next update will set this to 3 and so on.

One the index is optimized (by calling SolServer.optimize()) the count is
correct again.

Am I missing something or is this a bug in Solr/Lucene?

Thanks in advance
Thomas










Re: searching while importing

2010-10-15 Thread Gora Mohanty
On Thu, Oct 14, 2010 at 4:08 AM, Shawn Heisey s...@elyograg.org wrote:
  If you are using the DataImportHandler, you will not be able to search new
 data until the full-import or delta-import is complete and the update is
 committed.  When I do a full reindex, it takes about 5 hours, and until it
 is finished, I cannot search it.

 I have not tried to issue a manual commit in the middle of an import to see
 whether that makes data inserted up to that point searchable, but I would
 not expect that to work.
[...]

Just as a data point, we have done this, and yes it is possible to do a commit
in the middle of an import, and have the documents that have already been
indexed be available for search.

Regards,
Gora


filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
At the Lucene Revolution conference I asked about efficiently building a filter 
query from an external list of Solr unique ids.

Some use cases I can think of are:
1)  personal sub-collections (in our case a user can create a small subset 
of our 6.5 million doc collection and then run filter queries against it)
2)  tagging documents
3)  access control lists
4)  anything that needs complex relational joins
5)  a sort of alternative to incremental field updating (i.e. update in an 
external database or kv store)
6)  Grant's clustering cluster points and similar apps.

Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be 
any work on it yet.

Hoss  mentioned a couple of ideas:
1) sub-classing query parser
2) Having the app query a database and somehow passing something to 
Solr or lucene for the filter query

Can Hoss or someone else point me to more detailed information on what might be 
involved in the two ideas listed above?

Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids 
needed to implement this or is that a separate issue?


Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search






RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Jonathan Rochkind
Definitely interested in this. 

The naive obvious approach would be just putting all the ID's in the query. 
Like fq=(id:1 OR id:2 OR).  Or making it another clause in the 'q'.  

Can you outline what's wrong with this approach, to make it more clear what's 
needed in a solution?

From: Burton-West, Tom [tburt...@umich.edu]
Sent: Friday, October 15, 2010 11:49 AM
To: solr-user@lucene.apache.org
Subject: filter query from external list of Solr unique IDs

At the Lucene Revolution conference I asked about efficiently building a filter 
query from an external list of Solr unique ids.

Some use cases I can think of are:
1)  personal sub-collections (in our case a user can create a small subset 
of our 6.5 million doc collection and then run filter queries against it)
2)  tagging documents
3)  access control lists
4)  anything that needs complex relational joins
5)  a sort of alternative to incremental field updating (i.e. update in an 
external database or kv store)
6)  Grant's clustering cluster points and similar apps.

Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be 
any work on it yet.

Hoss  mentioned a couple of ideas:
1) sub-classing query parser
2) Having the app query a database and somehow passing something to 
Solr or lucene for the filter query

Can Hoss or someone else point me to more detailed information on what might be 
involved in the two ideas listed above?

Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids 
needed to implement this or is that a separate issue?


Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search






Re: filter query from external list of Solr unique IDs

2010-10-15 Thread Yonik Seeley
On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom tburt...@umich.edu wrote:
 At the Lucene Revolution conference I asked about efficiently building a 
 filter query from an external list of Solr unique ids.

Yeah, I've thought about a special query parser and query to deal with
this (relatively) efficiently, both from a query perspective and a
memory perspective.

Should be pretty quick to throw together:
- comma separated list of terms (unique ids are a special case of this)
- in the query, store as a single byte array for efficiency
- sort the ids if they aren't already sorted
- do lookups with a term enumerator and skip weighting or anything
else like that
- configurable caching... may, or may not want to cache this big query

That's only part of the stuff you mention, but seems like it would be
useful to a number of people.

-Yonik
http://www.lucidimagination.com


Re: Sorting on arbitary 'custom' fields

2010-10-15 Thread Simon Wistow
On Mon, Oct 11, 2010 at 07:17:43PM +0100, me said:
 It was just an idea though and I was hoping that there would be a 
 simpler more orthodox way of doing it.

In the end, for anyone who cares, we used dynamic fields.

There are a lot of them but we haven't seen performance impacted that 
badly so far.






Re: weighted facets

2010-10-15 Thread Peter Karich
Hi,

answering my own question(s).

Result grouping could be the solution as I explained here:
https://issues.apache.org/jira/browse/SOLR-385

 http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf (the file is dated to Aug 
 2008)

yonik implemented this here:
https://issues.apache.org/jira/browse/SOLR-153

So, really cool: he's the inventor/first-thinker of their 'bitset tree'
! :-)
http://search.lucidimagination.com/search/document/6ccbec5e602687ae/facet_optimizing#6ccbec5e602687ae

Regards,
Peter.

 Hi,

 I need a feature which is well explained from Mr Goll at this site **

 So, it then would be nice to do sth. like:

 facet.stats=sum(fieldX)facet.stats.sort=fieldX

 And the output (sorted against the sum-output) can look sth. like this:
 lst name=facet_counts
  lst name=facet_fields
lst name=tag
  int name=jobs  fieldX=14700767/int
  int name=video fieldX=13700892/int

 Is there something similar or was this answered from Hoss at the lucene
 revolution? If not I'll open a JIRA issue ...


 BTW: is the work from
 http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf contributed back to
 solr?


 Regards,
 Peter.



 PS: Related issue:
 https://issues.apache.org/jira/browse/SOLR-680
 https://issues.apache.org/jira/secure/attachment/12400054/SOLR-680.patch



 **
 http://lucene.crowdvine.com/posts/14137409

 Quoting his question in case the site goes offline:

 Hi Chris,

 Usually a facet search returns the document count for the
 unique values in the facet field. Is there a way to
 return a weighted facet count based on a user-defined function (sum,
 product, etc.) of another field?

 Here is a sum example. Assume we have the following
 4 documents with 3 fields

 ID facet_field weight_field
 1 solr 0.4
 2 lucene 0.3
 3 lucene 0.1
 4 lucene 0.2

 Is there a way to return

 solr 0.4
 lucene 0.6

 instead of

 solr 1
 lucene 3

 Given the facet_field contains multiple values

 ID facet_field weight_field
 1 solr lucene 0.2
 2 lucene 0.3
 3 solr lucene 0.1
 4 lucene 0.2

 Is there a way to return

 solr 0.3
 lucene 0.8

 instead of

 solr 2
 lucene 4

 Thanks,
 Johannes

   


-- 
http://jetwick.com twitter search prototype



Re: Term is duplicated when updating a document

2010-10-15 Thread Erick Erickson
This is actually known behavior. The problem is that when you update
a document, it's deleted and re-added, but the original is marked as
deleted. However, the terms aren't touched, both the original and the new
document's terms are counted. It'd be hard, very hard, to remove
the terms from the inverted index efficiently.

But when you optimize, all the deleted documents (and their assiociated
terms) are physically removed from the files, thus your term counts change.

HTH
Erick

On Fri, Oct 15, 2010 at 10:05 AM, Thomas Kellerer spam_ea...@gmx.netwrote:

 Thanks for the answer.


  Which fields are modified when the document is updated/replaced.


 Only one field was changed, but it was not the one where the auto-suggest
 term is coming from.


  Are there any differences in the content of the fields that you are using
 for the AutoSuggest.

 No


  Have you changed you schema.xml file recently? If you have, then there may
 have been changes in the way these fields are analyzed and broken down to
 terms.


 No, I did a complete index rebuild to rule out things like that.
 Then after startup, did a search, then updated the document and did a
 search again.

 Regards
 Thomas



 This may be a bug if you did not change the field or the schema file but
 the
 terms count is changing.

 On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellererspam_ea...@gmx.net
  wrote:

  Hi,

 we are updating our documents (that represent products in our shop) when
 a
 dealer modifies them, by calling
 SolrServer.add(SolrInputDocument) with the updated document.

 My understanding is, that there is no other way of updating an existing
 document.


 However we also use a term query to autocomplete the search field for the
 user, but each time adocument is updated (added) the term count is
 incremented. So after starting with a new index the count is e.g. 1, then
 the document (that contains that term) is updated, and the count is 2,
 the
 next update will set this to 3 and so on.

 One the index is optimized (by calling SolServer.optimize()) the count is
 correct again.

 Am I missing something or is this a bug in Solr/Lucene?

 Thanks in advance
 Thomas









RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Demian Katz
The main problem I've encountered with the lots of OR clauses approach is 
that you eventually hit the limit on Boolean clauses and the whole query fails. 
 You can keep raising the limit through the Solr configuration, but there's 
still a ceiling eventually.

- Demian

 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Friday, October 15, 2010 1:07 PM
 To: solr-user@lucene.apache.org
 Subject: RE: filter query from external list of Solr unique IDs
 
 Definitely interested in this.
 
 The naive obvious approach would be just putting all the ID's in the
 query. Like fq=(id:1 OR id:2 OR).  Or making it another clause in
 the 'q'.
 
 Can you outline what's wrong with this approach, to make it more clear
 what's needed in a solution?
 
 From: Burton-West, Tom [tburt...@umich.edu]
 Sent: Friday, October 15, 2010 11:49 AM
 To: solr-user@lucene.apache.org
 Subject: filter query from external list of Solr unique IDs
 
 At the Lucene Revolution conference I asked about efficiently building
 a filter query from an external list of Solr unique ids.
 
 Some use cases I can think of are:
 1)  personal sub-collections (in our case a user can create a small
 subset of our 6.5 million doc collection and then run filter queries
 against it)
 2)  tagging documents
 3)  access control lists
 4)  anything that needs complex relational joins
 5)  a sort of alternative to incremental field updating (i.e.
 update in an external database or kv store)
 6)  Grant's clustering cluster points and similar apps.
 
 Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't
 seem to be any work on it yet.
 
 Hoss  mentioned a couple of ideas:
 1) sub-classing query parser
 2) Having the app query a database and somehow passing
 something to Solr or lucene for the filter query
 
 Can Hoss or someone else point me to more detailed information on what
 might be involved in the two ideas listed above?
 
 Is somehow keeping an up-to-date map of unique Solr ids to internal
 Lucene ids needed to implement this or is that a separate issue?
 
 
 Tom Burton-West
 http://www.hathitrust.org/blogs/large-scale-search
 
 
 



RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
Hi Jonathan,

The advantages of the obvious approach you outline are that it is simple, it 
fits in to the existing Solr model, it doesn't require any customization or 
modification to Solr/Lucene java code.  Unfortunately, it does not scale well.  
We originally tried just what you suggest for our implementation of Collection 
Builder.  For a user's personal collection we had a table that maps the 
collection id to the unique Solr ids.
Then when they wanted to search their collection, we just took their search and 
added a filter query with the fq=(id:1 OR id:2 OR).   I seem to remember 
running in to a limit on the number of OR clauses allowed. Even if you can set 
that limit larger, there are a  number of efficiency issues.  

We ended up constructing a separate Solr index where we have a multi-valued 
collection number field. Unfortunately, until incremental field updating gets 
implemented, this means that every time someone adds a document to a 
collection, the entire document (including 700KB of OCR) needs to be re-indexed 
just to update the collection number field. This approach has allowed us to 
scale up to a total of something under 100,000 documents, but we don't think we 
can scale it much beyond that for various reasons.

I was actually thinking of some kind of custom Lucene/Solr component that would 
for example take a query parameter such as lookitUp=123 and the component 
might do a JDBC query against a database or kv store and return results in some 
form that would be efficient for Solr/Lucene to process. (Of course this 
assumes that a JDBC query would be more efficient than just sending a long list 
of ids to Solr).  The other part of the equation is mapping the unique Solr ids 
to internal Lucene ids in order to implement a filter query.   I was wondering 
if something like the unique id to Lucene id mapper in zoie might be useful or 
if that is too specific to zoie. SoThis may be totally off-base, since I 
haven't looked at the zoie code at all yet.

In our particular use case, we might be able to build some kind of in-memory 
map after we optimize an index and before we mount it in production. In our 
workflow, we update the index and optimize it before we release it and once it 
is released to production there is no indexing/merging taking place on the 
production index (so the internal Lucene ids don't change.)  

Tom



-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Friday, October 15, 2010 1:07 PM
To: solr-user@lucene.apache.org
Subject: RE: filter query from external list of Solr unique IDs

Definitely interested in this. 

The naive obvious approach would be just putting all the ID's in the query. 
Like fq=(id:1 OR id:2 OR).  Or making it another clause in the 'q'.  

Can you outline what's wrong with this approach, to make it more clear what's 
needed in a solution?



facet.field :java.lang.NullPointerException

2010-10-15 Thread Pradeep Singh
Faceting blows up when the field has no data. And this seems to be random.
Sometimes it will work even with no data, other times not. Sometimes the
error goes away if the field is set to multiValued=true (even though it's
one value every time), other times it doesn't. In all cases setting
facet.method to enum takes care of the problem. If this param is not set,
the default leads to null pointer exception.


09:18:52,218 SEVERE [SolrCore] Exception during facet.field of
xyz:java.lang.NullPointerException

  at java.lang.System.arraycopy(Native Method)

  at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247)

  at
org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164)

  at
org.apache.solr.request.NumberedTermsEnum.init(UnInvertedField.java:960)

  at
org.apache.solr.request.TermIndex$1.init(UnInvertedField.java:1151)

  at
org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151)

  at
org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204)

  at
org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:188)

  at
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911)

  at
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298)

  at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354)

  at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190)

  at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)

  at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)

  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)

  at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)

  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at


Re: facet.field :java.lang.NullPointerException

2010-10-15 Thread Yonik Seeley
This is https://issues.apache.org/jira/browse/SOLR-2142
I'll look into it soon.
-Yonik
http://www.lucidimagination.com



On Fri, Oct 15, 2010 at 3:12 PM, Pradeep Singh pksing...@gmail.com wrote:
 Faceting blows up when the field has no data. And this seems to be random.
 Sometimes it will work even with no data, other times not. Sometimes the
 error goes away if the field is set to multiValued=true (even though it's
 one value every time), other times it doesn't. In all cases setting
 facet.method to enum takes care of the problem. If this param is not set,
 the default leads to null pointer exception.


 09:18:52,218 SEVERE [SolrCore] Exception during facet.field of
 xyz:java.lang.NullPointerException

      at java.lang.System.arraycopy(Native Method)

      at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247)

      at
 org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164)

      at
 org.apache.solr.request.NumberedTermsEnum.init(UnInvertedField.java:960)

      at
 org.apache.solr.request.TermIndex$1.init(UnInvertedField.java:1151)

      at
 org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151)

      at
 org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204)

      at
 org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:188)

      at
 org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911)

      at
 org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298)

      at
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354)

      at
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190)

      at
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)

      at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)

      at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

      at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)

      at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)

      at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
                at



Re: Synchronizing Solr with a PostgreDB

2010-10-15 Thread Juan Manuel Alvarez
Thanks for the quick response! =o)
We will go with that approach.

On Thu, Oct 14, 2010 at 7:19 PM, Allistair Crossley a...@roxxor.co.uk wrote:
 i would not cross-reference solr results with your database to merge unless 
 you want to spank your database. nor would i load solr with all your data. 
 what i have found is that the search results page is generally a small subset 
 of data relating to the fuller document/result. therefore i store only the 
 data required to present the search results wholly from solr. the user can 
 choose to click into a specific result which then uses just the database to 
 present it.

 use data import handler - define an xml config to import as many entities 
 into your document as you need and map columns to fields in schema.xml. use 
 the Wiki page on DIH - it's all there, as well as example config in the solr 
 distro.

 allistair

 On Oct 14, 2010, at 6:13 PM, Juan Manuel Alvarez wrote:

 Hello everyone! I am new to Solr and Lucene and I would like to ask
 you a couple of questions.

 I am working on an existing system that has the data saved in a
 Postgre DB and now I am trying to integrate Solr to use full-text
 search and faceted search, but I am having a couple of doubts about
 it.

 1) I see two ways of storing the data and make the search:
 - Duplicate all the DB data in Solr, so complete results are returned
 from a search query, or...
 - Put in Solr just the data that I need to search and, after finding
 the elements with a Solr query, use the result to make a more specific
 query to the DB.

 Which is the way this is normally done?

 2) How do I synchronize Solr and Postgre? Do I have to use the
 DataImportHandler or when I do the INSERT command into Postgre, I have
 to execute a command into Solr?

 Thanks for your time!

 Cheers!
 Juan M.




Re: SOLRJ - Searching text in all fields of a Bean

2010-10-15 Thread Ahmet Arslan
You can replace query.setQueryType(dismax) with query.set(defType, 
dismax);

Also don't forget to request title field with fl parameter. 
query.addField(title);



  

Re: Solr with example Jetty and score problem

2010-10-15 Thread Chris Hostetter

: Thanks. But do you have any suggest or work-around to deal with it?

Posted in SOLR-2140

   field name=score type=ignored multiValued=false / 

..this key is to make sure solr knows score is not multiValued


-Hoss


Re: ant build problem

2010-10-15 Thread Chris Hostetter

: i updated my solr trunk to revision 1004527. when i go for compiling
: the trunk with ant i get so many warnings, but the build is successful. the

Most of these warnings are legitimate, the probelms have always been 
there, but recently the Lucene build file was updated to warn about them 
by default.

This one though...
: [javac] warning: [path] bad path element
: /usr/share/ant/lib/hamcrest-core.jar: no such file or directory

...thta's something specific to your setup.  something in your systems ant 
configs thinks thta jar should be there.

: After the compiling i thought to check with the ant test and performed but
: it is failed..

failing tests are also a posisbility ... there are several tests in hte 
code base right now that fail sporadicly (especially because of recent 
changes ot hte build system designed to get test that *might* fail 
based on locale to fail more often) and people are working on them -- 
w/o full details about wat failurs you got though, we can't say if they 
are known issues.


-Hoss


Re: having problem about Solr Date Field.

2010-10-15 Thread Chris Hostetter

: So, regarding DST, do you put everything in GMT, and make adjustments 
: for in the 'seach for/between' data/time values before the query for 
: both DST and TZ?

The client adding docs is hte only one that knows what TZ it's in when it 
formats the docs to add them, and the client issuing the query is hte 
only one that knows what TZ it's in when it formats the query string to 
execute the query.  in both cases the client must use the UTC TZ when 
formating the date strings so that Solr can deal with it correctly.


-Hoss


Re: Question related to phrase search in lucene/solr?

2010-10-15 Thread Chris Hostetter

: I have question is it possible to perform a phrase search with wild  cards  
in 
: solr/lucene as if i have two queries both have exactly same  results one is
: +Contents:change market
: 
: and other is 
: +Contents:chnage* market
: 
: but i think the second should match chages market as well but it does not 
: matches it. Any help would be appreciated

In my experience, 90% of the times people ask about using wildcards in a 
phrase query what they really want is simple stemming of the terms -- the 
one example you've cited is an example of this.  If your Contents field 
uses an analyzer that does stemming then change market and changes 
market would both match.



-Hoss


Re: Disable (or prohibit) per-field overrides

2010-10-15 Thread Chris Hostetter

: Anyone knows useful method to disable or prohibit the per-field override 
: features for the search components? If not, where to start to make it 
: configurable via solrconfig and attempt to come up with a working patch?

If your goal is to prevent *clients* from specifying these (while you're 
still allowed to use them in your defaults) then the simplest solution is 
probably something external to Solr -- along the lines of mod_rewrite.

Internally...

that would be tough.

You could probably write a SearchComponent (configured to run first) 
that does it fairly easily -- just wrap the SolrParams in an impl that 
retuns null anytime a component asks for a param name that starts with 
f. (and excludes those param names when asked for a list of the param 
names) 


It could probably be generalized to support arbitrary rules i na way 
that might be handy for other folks, but it would still just be 
wrapping all of hte params, so it would prevent you from using them 
in your config as well.

Ultimatley i think a general solution would need to be in 
RequestHandlerBase ... where it wraps the request params using the 
defaults and invariants ... you'd want the custom exclusion rules to apply 
only to the request params from the client.




-Hoss


RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
Thanks Yonik,

Is this something you might have time to throw together, or an outline of what 
needs to be thrown together?
Is this something that should be asked on the developer's list or discussed in 
SOLR 1715 or does it make the most sense to keep the discussion in this thread?

Tom

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, October 15, 2010 1:19 PM
To: solr-user@lucene.apache.org
Subject: Re: filter query from external list of Solr unique IDs

On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom tburt...@umich.edu wrote:
 At the Lucene Revolution conference I asked about efficiently building a 
 filter query from an external list of Solr unique ids.
Yeah, I've thought about a special query parser and query to deal with
this (relatively) efficiently, both from a query perspective and a
memory perspective.

Should be pretty quick to throw together:
- comma separated list of terms (unique ids are a special case of this)
- in the query, store as a single byte array for efficiency
- sort the ids if they aren't already sorted
- do lookups with a term enumerator and skip weighting or anything
else like that
- configurable caching... may, or may not want to cache this big query

That's only part of the stuff you mention, but seems like it would be
useful to a number of people.

-Yonik
http://www.lucidimagination.com


SOLR DateTime and SortableLongField field type problems

2010-10-15 Thread Ken Stanley
Hello all,

I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow
the advice from
http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.html about
converting date fields to SortableLong fields for better memory efficiency.
However, whenever I try to do this using the DateFormater, I get exceptions
when indexing for every row that tries to create my sortable fields.

In my schema.xml, I have the following definitions for the fieldType and
dynamicField:

fieldType name=sdate class=solr.SortableLongField indexed=true
stored=false sortMissingLast=true omitNorms=true /
dynamicField name=sort_date_* type=sdate stored=false indexed=true
/

In my dih.xml, I have the following definitions:

dataConfig
dataSource type=FileDataSource encoding=UTF-8 /
entity
name=xml_stories
rootEntity=false
dataSource=null
processor=FileListEntityProcessor
fileName=legacy_stories.*\.xml$
recursive=false
baseDir=/usr/local/extracts
newerThan=${dataimporter.xml_stories.last_index_time}

entity
name=stories
pk=id
dataSource=xml_stories
processor=XPathEntityProcessor
url=${xml_stories.fileAbsolutePath}
forEach=/RECORDS/RECORD
stream=true

transformer=DateFormatTransformer,HTMLStripTransformer,RegexTransformer,TemplateTransformer
onError=continue

field column=_modified_date
xpath=/RECORDS/RECORD/pr...@name='R_ModifiedTime']/PVAL /
field column=modified_date sourceColName=_modified_date
dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' /

field column=_df_date_published
xpath=/RECORDS/RECORD/pr...@name='R_StoryDate']/PVAL /
field column=df_date_published
sourceColName=_df_date_published dateTimeFormat=-MM-dd'T'hh:mm:ss'Z'
/

field column=sort_date_modified
sourceColName=modified_date dateTimeFormat=MMddhhmmss /
field column=sort_date_published
sourceColName=df_date_published dateTimeFormat=MMddhhmmss /
/entity
/entity
/document
/dataConfig

The fields in question are in the formats:

RECORDS
RECORD
PROP NAME=R_StoryDate
PVAL2001-12-04T00:00:00Z/PVAL
/PROP
PROP NAME=R_ModifiedTime
PVAL2001-12-04T19:38:01Z/PVAL
/PROP
/RECORD
/RECORDS

The exception that I am receiving is:

Oct 15, 2010 6:23:24 PM
org.apache.solr.handler.dataimport.DateFormatTransformer transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: Wed Nov 28 21:39:05 EST 2007
at java.text.DateFormat.parse(DateFormat.java:337)
at
org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89)
at
org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

I know that it has to be the SortableLong fields, because if I remove just
those two lines from my dih.xml, everything imports as I expect it to. Am I
doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is
this not supported in my version of SOLR? I'm not very experienced with
Java, so digging into the code would be a lost cause for me right now. I was
hoping that somebody here might be able to help point me in the
right/correct direction.

It should be noted that the modified_date and df_date_published fields index
just fine (so long as I do it as I've defined above).

Thank you,

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy


Re: Synchronizing Solr with a PostgreDB

2010-10-15 Thread Dennis Gearon
We're doing what was recommended. Nice to hear we're on the right path.

Yeah Postgres!
Yeah Solr/Lucene!

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Fri, 10/15/10, Juan Manuel Alvarez naici...@gmail.com wrote:

 From: Juan Manuel Alvarez naici...@gmail.com
 Subject: Re: Synchronizing Solr with a PostgreDB
 To: solr-user@lucene.apache.org
 Date: Friday, October 15, 2010, 1:04 PM
 Thanks for the quick response! =o)
 We will go with that approach.
 
 On Thu, Oct 14, 2010 at 7:19 PM, Allistair Crossley a...@roxxor.co.uk
 wrote:
  i would not cross-reference solr results with your
 database to merge unless you want to spank your database.
 nor would i load solr with all your data. what i have found
 is that the search results page is generally a small subset
 of data relating to the fuller document/result. therefore i
 store only the data required to present the search results
 wholly from solr. the user can choose to click into a
 specific result which then uses just the database to present
 it.
 
  use data import handler - define an xml config to
 import as many entities into your document as you need and
 map columns to fields in schema.xml. use the Wiki page on
 DIH - it's all there, as well as example config in the solr
 distro.
 
  allistair
 
  On Oct 14, 2010, at 6:13 PM, Juan Manuel Alvarez
 wrote:
 
  Hello everyone! I am new to Solr and Lucene and I
 would like to ask
  you a couple of questions.
 
  I am working on an existing system that has the
 data saved in a
  Postgre DB and now I am trying to integrate Solr
 to use full-text
  search and faceted search, but I am having a
 couple of doubts about
  it.
 
  1) I see two ways of storing the data and make the
 search:
  - Duplicate all the DB data in Solr, so complete
 results are returned
  from a search query, or...
  - Put in Solr just the data that I need to search
 and, after finding
  the elements with a Solr query, use the result to
 make a more specific
  query to the DB.
 
  Which is the way this is normally done?
 
  2) How do I synchronize Solr and Postgre? Do I
 have to use the
  DataImportHandler or when I do the INSERT command
 into Postgre, I have
  to execute a command into Solr?
 
  Thanks for your time!
 
  Cheers!
  Juan M.
 
 



Re: Virtual field, Statistics

2010-10-15 Thread Lance Norskog
Please add a JIRA issue requesting this. A bunch of things are not
supported for functions: returning as a field value, for example.

On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal tanguy.m...@gmail.com wrote:
 Dear solr-user folks,

 I would like to use the stats module to perform very basic statistics
 (mean, min and max) which is actually working just fine.

 Nethertheless I found a little limitation that bothers me a tiny bit :
 how to perform the exact same statistics, but on the result of a
 function query rather than a field.

 Example :
 schema :
 - string : id
 - float : width
 - float : height
 - float : depth
 - string : color
 - float : price

 What I'd like to do is something like :
 select?price:[45.5 TO
 99.99]stats=onstats.facet=colorstats.field={volume=product(product(width,
 height), depth)}
 I would expect to obtain :

 lst name=stats
  lst name=stats_fields
  lst name=(product(product(width,height),depth))
   double name=min.../double
   double name=max.../double
   double name=sum.../double
   long name=count.../long
   long name=missing.../long
   double name=sumOfSquares.../double
   double name=mean.../double
   double name=stddev.../double
   lst name=facets
    lst name=color
     lst name=white
      double name=min.../double
      double name=max.../double
      double name=sum.../double
      long name=count.../long
      long name=missing.../long
      double name=sumOfSquares.../double
      double name=mean.../double
      double name=stddev.../double
    /lst
    lst name=red
      double name=min.../double
      double name=max.../double
      double name=sum.../double
      long name=count.../long
      long name=missing.../long
      double name=sumOfSquares.../double
      double name=mean.../double
      double name=stddev.../double
    /lst
    !-- Other facets on other colors go here --
   /lst!-- end of statistical facets on volumes --
  /lst!-- end of stats on volumes --
  /lst!-- end of stats_fields node --
 /lst

 Of course computing the volume can be performed before indexing data,
 but defining virtual fields on the fly given an arbitrary function is
 powerful and I am comfortable with the idea that many others would
 appreciate. Especially for BI needs and so on... :-D
 Is there a way to do it easily that I would have not been able to
 find, or is it actually impossible ?

 Thank you very much in advance for your help.

 --
 Tanguy




-- 
Lance Norskog
goks...@gmail.com