date:20090109

2009/1/10 Lance Norskog 

>
> I have used the rss format of the data input handler, and it works well but
> has problems with detecting errors etc. That is, it works well when it
> works
> but does not fail gracefully in a useful way.
>
>
Lance, some error handling logic was added after you described your
use-cases in a previous mail:

https://issues.apache.org/jira/browse/SOLR-842

We also have a very simple event listener in DIH for import start and end.
Probably we can another for onError as well.

https://issues.apache.org/jira/browse/SOLR-938

If there are other things that can help make DIH more robust, please do let
us know.

-- 
Regards,
Shalin Shekhar Mangar.

Re: EmbeddedSolrServer in Single Core

2009-01-09 Thread Ryan McKinley



On Jan 9, 2009, at 8:12 PM, qp19 wrote:



Please bear with me. I am new to Solr. I have searched all the  
existing posts
about this and could not find an answer. I wanted to know how do I  
go about

creating a

SolrServer using EmbeddedSolrServer. I tried to initialize this  
several ways

but was unsuccesful. I do not have multi-core. I am using solrj 1.3. I
attempted to use the

depracated methods as mentioned in the SolrJ documentation the  
following way

but it fails as well with unable to locate Core.


SolrCore core = SolrCore.getSolrCore();


This function is deprecated and *really* should no be used --  
especially for embedded solr server.  (the only chance you would have  
for it to work is if you start up solr in a web app before calling this)




 SolrServer server = new EmbeddedSolrServer( core );



Core initialization is kind of a mess, but this contains everything  
you would need:


  CoreContainer container = new CoreContainer(new  
SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
  CoreDescriptor dcore = new CoreDescriptor(container, coreName,  
solrConfig.getResourceLoader().getInstanceDir());

  dcore.setConfigName(solrConfig.getResourceName());
  dcore.setSchemaName(indexSchema.getResourceName());
  SolrCore core = new SolrCore( null, dataDirectory, solrConfig,  
indexSchema, dcore);

  container.register(coreName, core, false);



So far my installation is pretty basic with Solr running on Tomcat  
as per
instructions in the wiki. My solr home is outside of webapps folder  
i.e

"c:/tomcat-solr/solr". I am

able to connect using CommonsHttpSolrServer("http://localhost:8080/solr 
")

without a problem. The question in a nutshell is, how do I instantiate
EmbeddedSolrServer using  new EmbeddedSolrServer(CoreContainer
coreContainer, String coreName) ? Initializing CoreContainer appears  
to be
complicated when compared to SolrCore.getSolrCore() as per the  
examples. Is
there a simpler way to Initialize CoreContainer? Is a core(or  
CoreName)

necessary eventhough I don't use multi-core? Also, is it possible to
initialize EmbeddedSolrServer using spring? Thanks in advance for  
the help.




yes, I use this:


  
${dir}
${dconfigFile}
  

  class="org.apache.solr.client.solrj.embedded.EmbeddedSolrServer">



  

  class="org.apache.solr.client.solrj.embedded.EmbeddedSolrServer">



  


ryan

Re: Solr expert(s) needed

2009-01-09 Thread Lance Norskog

I don't know about the Nutch format -> Solr schema idea either. The
NUTCH-442 system uses Solr for both indexing and searching, and uses Nutch
for only crawling.

At my last job we had a custom scripting system that crawled the front page
of over 5000 sites. Each site had a configured script. Yes, it was complex.
We also had custom crawlers for Youtube & myspace and some other sites which
gave APIs, but in general it was all hand-coded.

I have used the rss format of the data input handler, and it works well but
has problems with detecting errors etc. That is, it works well when it works
but does not fail gracefully in a useful way.

Lance

2009/1/9 Tony Wang 

> Thanks Lance! I have no idea whether the Nuth-generated index could be
> converted to Solr schema. I wonder what people are using this NUTCH-442 for
> (http://issues.apache.org/jira/browse/NUTCH-442).
>
> So what crawler do you use to generate index for Solr? Thanks a lot!!
>
> On Fri, Jan 9, 2009 at 8:04 PM, Lance Norskog  wrote:
>
> > http://issues.apache.org/jira/browse/NUTCH-442
> >
> > Haven't used Nutch. Can the Nutch-generated index be reverse-engineered
> > into
> > a Solr schema? In that case, you can just copy the Lucene index files
> away
> > from Nutch and run them under Solr.
> >
>
>
>
> --
> Are you RCholic? www.RCholic.com
> 温 良 恭 俭 让 仁 义 礼 智 信
>

UUID field type documentation and ExtractingRequestHandler

2009-01-09 Thread Lance Norskog

The UUID field type is not documented on the Wiki.

https://issues.apache.org/jira/browse/SOLR-308

The ExtractingRequestHandler creates its own UUID instead of using the UUID
field type.

http://issues.apache.org/jira/browse/SOLR-284

Re: Solr expert(s) needed

2009-01-09 Thread Tony Wang

Thanks Lance! I have no idea whether the Nuth-generated index could be
converted to Solr schema. I wonder what people are using this NUTCH-442 for
(http://issues.apache.org/jira/browse/NUTCH-442).

So what crawler do you use to generate index for Solr? Thanks a lot!!

On Fri, Jan 9, 2009 at 8:04 PM, Lance Norskog  wrote:

> http://issues.apache.org/jira/browse/NUTCH-442
>
> Haven't used Nutch. Can the Nutch-generated index be reverse-engineered
> into
> a Solr schema? In that case, you can just copy the Lucene index files away
> from Nutch and run them under Solr.
>

-- 
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信

Re: Solr expert(s) needed

2009-01-09 Thread Lance Norskog

http://issues.apache.org/jira/browse/NUTCH-442

Haven't used Nutch. Can the Nutch-generated index be reverse-engineered into
a Solr schema? In that case, you can just copy the Lucene index files away
from Nutch and run them under Solr.

RE: Ensuring documents indexed by autocommit

2009-01-09 Thread Chris Hostetter

: Thanks again for your inputs.
: But then I am still stuck on the question that how do we ensure that
: document is successfully indexed. One option I see is search for every

Have faith.

If the add completes successfully then the data made it to solr, was 
indexed, and now lives in the index files.

If the commit completes sucessfully then the index files have been flushed 
and checkpointed so all new uses of them will seethe data.

If you want to be sure your data is indexed, all you have to do is check 
that neither of those calls got an error (hence Shalin's point about doing 
the commit yourself instead of using autocommit so you can actually test 
the response from the commit call.  But frankly, i wouldn't worry so much.

(How do you ensure that rows are successfully stored when you do database 
updates?)


-Hoss

EmbeddedSolrServer in Single Core

2009-01-09 Thread qp19


Please bear with me. I am new to Solr. I have searched all the existing posts
about this and could not find an answer. I wanted to know how do I go about
creating a 

SolrServer using EmbeddedSolrServer. I tried to initialize this several ways
but was unsuccesful. I do not have multi-core. I am using solrj 1.3. I
attempted to use the 

depracated methods as mentioned in the SolrJ documentation the following way
but it fails as well with unable to locate Core.


 SolrCore core = SolrCore.getSolrCore();
  SolrServer server = new EmbeddedSolrServer( core );

So far my installation is pretty basic with Solr running on Tomcat as per
instructions in the wiki. My solr home is outside of webapps folder i.e
"c:/tomcat-solr/solr". I am 

able to connect using CommonsHttpSolrServer("http://localhost:8080/solr";)
without a problem. The question in a nutshell is, how do I instantiate
EmbeddedSolrServer using  new EmbeddedSolrServer(CoreContainer
coreContainer, String coreName) ? Initializing CoreContainer appears to be
complicated when compared to SolrCore.getSolrCore() as per the examples. Is
there a simpler way to Initialize CoreContainer? Is a core(or CoreName)
necessary eventhough I don't use multi-core? Also, is it possible to
initialize EmbeddedSolrServer using spring? Thanks in advance for the help.


-- 
View this message in context: 
http://www.nabble.com/EmbeddedSolrServer-in-Single-Core-tp21383525p21383525.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr/Lucene MoreLikeThis with RangeQuery

2009-01-09 Thread Clas Rydergren

Hi,

Thanks for the help.

>If i'm understanding you correctly, you modified the MoreLikeThis class to
>include your new clause (using those two lines above) correct?

Yes.

The time field is a "long" and so is the range variables, so the
problem should not be related to that. If I construct the query by
adding a ConstantScoreRangeQuery, nothing more, no results are
returned. But I have not tried to add it to the filter part of the
mlt-handler; I suspect that this would solve the problem.

However, after trying more alternatives, I think that adding a

&fq=time:[1230922259744+TO+1231440659744]

to the mlt-url-request seems to actually add a time filter to the
constructed MLT-query:

Query:(+kategori:nyheter titel:moderbolaget^2.0 artikel:moderbolaget
titel:pininfarin^2.0 artikel:pininfarin titel:bilbygg^1.9725448
artikel:bilbygg^0.9862724 titel:huvudäg^1.9257689
artikel:huvudäg^0.9628844 titel:uddevall^1.9054867
artikel:uddevall^0.95274335 titel:majoritet^1.71646
artikel:majoritet^0.85823 titel:volvo^1.6696839
artikel:volvo^0.83484197 titel:italiensk^1.5226858
artikel:italiensk^0.7613429)~5

So, a mlt.fq does not seems to be necessary to implement since the fq
filter seems to be passed to the mlt-query.

To use a long for the time field rather than a Field.Date is probably
bad, but it seems to work at least for testing. So, I think that my
problem is solved. Thanks!

/Clas






On Fri, Jan 9, 2009 at 2:40 AM, Chris Hostetter
 wrote:
>
> : Solr/Lucene. I am in a situation where I think that I can improve the
> : quality of the LikeThis-documents significantly by restricting the
> : MoreLikeThis-query to documents where one field has its term in a
> : specified range. That is, I would like to add a RangeQuery to the
> : default MoreLikeThis query.
>[...]
> : I would like to also add a range restriction as,
> :
> :   rq = new 
> ConstantScoreRangeQuery("time",startTimeString,endTimeString,true,true);
> :   query.add(rq, BooleanClause.Occur.MUST);
> :
> : This is all made in
> :   
> contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java
> :
> : However, this does not work at all when running from Solr (no MLT
> : suggestions are returned). I suspect that the problem is that the
>
> If i'm understanding you correctly, you modified the MoreLikeThis class to
> include your new clause (using those two lines above) correct?
>
> If you aren't getting any results, i suspect it may be an issues of term
> value encoding ... is your "time" field a Solr DateField?  what is the
> value of startTimeString and endTimeString? ... if you replace all of the
> MLT Query logic so that it's *just* the ConstantScoreRangeQuery do you get
> any results?
>
> : does not perform a standard query, but a getDocList:
> :
> :   results.docList = searcher.getDocList(mltQuery, filters, null,
> : start, rows, flags);
> :
> : and that this type of query does not handle a RangeQuery. Is this
> : correct, or what is the problem with adding a RangeQuery? Should it be
>
> a RangeQuery will work just fine.  but in general the type of problem you
> are trying to solve could be more generally dealt with if the MLT code had
> a way to let people specify "filter" queries (like the existing "fq"
> param) to be applied tothe MLT logic -- that way they wouldn't contribute
> to the relevancy ... it seems like it would be pretty easy to add a
> "mlt.fq" param for this purpose if you wanted to appraoch the problem
> thatway as a more generic path -- but i'm not too familiar with the MLT
> code to say for certain waht would be required, and I know the code is
> probably more complicated then it should be with the MoreLikeThisHandler
> and the MoreLikeThisComponent (i think there's a MoreLikeThisHelper that
> they share or something)
>
>
>
> -Hoss
>
>

Re: Amount range and facet fields returns [facet_fields]



On Jan 8, 2009, at 9:29 AM, Yevgeniy Belman wrote:
the response i get when executing only the following, produces no  
facet

counts. It could be a bug.

facet.query=[price:[* TO 500], price:[500 TO *]


That's an invalid query.  If you want two ranges, use two facet.query  
parameters.


Erik

Re: Boosting based on number of values in multiValued field?



On Jan 9, 2009, at 12:56 PM, Eric Kilby wrote:
Each document has a multivalued field, with 1-n values in it (as  
many as
20).  The actual values don't matter to me, but the number of values  
is a
rough proxy for the quality of a record.  I'd like to apply a very  
small
boost based on the number of values in that field, so that among a  
set of
similar documents the ones with more values will score higher and  
sort ahead

of those with less values.


The simplest technique would be to have your indexer add another field  
with the count (or some boost factor based on it), and then leverage  
that.  Perhaps even use the document boost capability at indexing time.


Erik

Re: Deduplication patch not working in nightly build


Hey Mark,
Sorry I was not enough especific, I wanted to mean that I have and I always
had autoCommit=false.
I will do some more traces and test. Will post if I have any new important
thing to mention.

Thanks.


Marc Sturlese wrote:
> 
> Hey Shalin,
> 
> In the begining (when the error was appearing) i had 
> 32
> and no maxBufferedDocs set
> 
> Now I have:
> 32
> 50
> 
> I think taht setting maxBufferedDocs to 50 I am forcing more disk writting
> than I would like... but at least it works fine (but a bit
> slower,opiously).
> 
> I keep saying that the most weird thing is that I don't have that problem
> using solr1.3, just with the nightly...
> 
> Even that it's good that it works well now, would be great if someone can
> give me an explanation why this is happening
>  
> 
> 
> Shalin Shekhar Mangar wrote:
>> 
>> On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese
>> wrote:
>> 
>>>
>>> hey there,
>>> I hadn't autoCommit set to true but I have it sorted! The error
>>> stopped
>>> appearing after setting the property maxBufferedDocs in solrconfig.xml.
>>> I
>>> can't exactly undersand why but it just worked.
>>> Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the
>>> same?
>>>
>>>
>> What I find strange is this line in the exception:
>> "Last packet sent to the server was 202481 ms ago."
>> 
>> Something took very very long to complete and the connection got closed
>> by
>> the time the next row was fetched from the opened resultset.
>> 
>> Just curious, what was the previous value of maxBufferedDocs and what did
>> you change it to?
>> 
>> 
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
>> -- 
>> Regards,
>> Shalin Shekhar Mangar.
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21378069.html
Sent from the Solr - User mailing list archive at Nabble.com.

Boosting based on number of values in multiValued field?

2009-01-09 Thread Eric Kilby


hi,

I'm looking through the list archives and the documentation on boost
queries, and I don't see anything that matches this case.  

I have an index of documents, some of which are very similar but not
identical.  Therefore the scores are very close and the ordering is affected
by somewhat arbitrary factors.  When I do a query the similar documents come
up close together, so that's a good start.  

Each document has a multivalued field, with 1-n values in it (as many as
20).  The actual values don't matter to me, but the number of values is a
rough proxy for the quality of a record.  I'd like to apply a very small
boost based on the number of values in that field, so that among a set of
similar documents the ones with more values will score higher and sort ahead
of those with less values.

Is there currently a function or set of functions that can be applied to
this use case?  Or a place where I could build and contribute something?  In
that case I'd look for a starting point on where to look.

thanks,
Eric
-- 
View this message in context: 
http://www.nabble.com/Boosting-based-on-number-of-values-in-multiValued-field--tp21377250p21377250.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deduplication patch not working in nightly build

2009-01-09 Thread Mark Miller

Your basically writing segments more often now, and somehow avoiding a
longer merge I think. Also, likely, deduplication is probably adding
enough extra data to your index to hit a sweet spot where a merge is too
long. Or something to that effect - MySql is especially sensitive to
timeouts when doing a select * on a huge db in my testing. I didnt
understand your answer on the autocommit - I take it you are using it?
Or no?

All a guess, but it def points to a merge taking a bit long and causing
a timeout. I think you can relax the MySql timeout settings if that is it.

I'd like to get to the bottom of this as well, so any other info you can
provide would be great.

- Mark

Marc Sturlese wrote:

Hey Shalin,

In the begining (when the error was appearing) i had
32

and no maxBufferedDocs set

Now I have:
32
50

I think taht setting maxBufferedDocs to 50 I am forcing more disk writting
than I would like... but at least it works fine (but a bit slower,opiously).

I keep saying that the most weird thing is that I don't have that problem
using solr1.3, just with the nightly...

Even that it's good that it works well now, would be great if someone can
give me an explanation why this is happening

Shalin Shekhar Mangar wrote:

On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese
wrote:

hey there,
I hadn't autoCommit set to true but I have it sorted! The error
stopped
appearing after setting the property maxBufferedDocs in solrconfig.xml. I
can't exactly undersand why but it just worked.
Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same?

What I find strange is this line in the exception:
"Last packet sent to the server was 202481 ms ago."

Something took very very long to complete and the connection got closed by
the time the next row was fetched from the opened resultset.

Just curious, what was the previous value of maxBufferedDocs and what did
you change it to?

--
View this message in context:
http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Regards,
Shalin Shekhar Mangar.

Re: Deduplication patch not working in nightly build


Hey Shalin,

In the begining (when the error was appearing) i had 
32
and no maxBufferedDocs set

Now I have:
32
50

I think taht setting maxBufferedDocs to 50 I am forcing more disk writting
than I would like... but at least it works fine (but a bit slower,opiously).

I keep saying that the most weird thing is that I don't have that problem
using solr1.3, just with the nightly...

Even that it's good that it works well now, would be great if someone can
give me an explanation why this is happening
 


Shalin Shekhar Mangar wrote:
> 
> On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese
> wrote:
> 
>>
>> hey there,
>> I hadn't autoCommit set to true but I have it sorted! The error
>> stopped
>> appearing after setting the property maxBufferedDocs in solrconfig.xml. I
>> can't exactly undersand why but it just worked.
>> Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same?
>>
>>
> What I find strange is this line in the exception:
> "Last packet sent to the server was 202481 ms ago."
> 
> Something took very very long to complete and the connection got closed by
> the time the next row was fetched from the opened resultset.
> 
> Just curious, what was the previous value of maxBufferedDocs and what did
> you change it to?
> 
> 
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21376235.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deduplication patch not working in nightly build

On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese wrote:

>
> hey there,
> I hadn't autoCommit set to true but I have it sorted! The error stopped
> appearing after setting the property maxBufferedDocs in solrconfig.xml. I
> can't exactly undersand why but it just worked.
> Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same?
>
>
What I find strange is this line in the exception:
"Last packet sent to the server was 202481 ms ago."

Something took very very long to complete and the connection got closed by
the time the next row was fetched from the opened resultset.

Just curious, what was the previous value of maxBufferedDocs and what did
you change it to?

>
> --
> View this message in context:
> http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: Beginner: importing own data

You were searching for "1899" which is the value of the "date" field in the
document you added. You need to specify q=date:1899 to search on the date
field.

You can also use the "" element in schema.xml to specify
the field on which you'd like to search if no field name is specified in the
query. Typically, one creates a catch-all field which copies data from all
the fields you want to search on.

http://wiki.apache.org/solr/SchemaXml#head-b80c539a0a01eef8034c3776e49e8fe1c064f496

Also look at the DisMax queries:

http://wiki.apache.org/solr/DisMaxRequestHandler

On Fri, Jan 9, 2009 at 8:35 PM, phil cryer  wrote:

> Otis
> Thanks for your reply, I wrote out a long email explaining the steps I
> took, and the results, but it was returned by the Solr-user email
> server stamped as spam.  I've put my note on pastebin, you can see it
> here: http://pastebin.cryer.us/pastebin.php?show=m359e2e47
>
> I'd appreciate any feedback, I know I'm close to getting this working,
> just can't see what I'm missing.
>
> Thank you
>
> P
>
> On Thu, Jan 8, 2009 at 4:19 PM, Otis Gospodnetic
>  wrote:
> > Phil,
> >
> > The easiest thing to do at this stage in Solr learning experience is to
> restart Solr (servlet container) and redo the search.  Results shouls start
> showing up then because this will effectively reopen the index.
> >
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: phil cryer 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, January 8, 2009 5:00:29 PM
> >> Subject: Beginner: importing own data
> >>
> >> So I have Solr running, I've run through the tutorials online, can
> >> import data from the example xml and see the results, so it works!
> >> Now, I take some xml data I have, convert it over to the add / doc
> >> type that the demo ones are, run it and find out which fields aren't
> >> defined in schema.xml, I add them there until they're all there and I
> >> can finally import my own xml into solr w/o error.  But, when I go to
> >> query solr, it's not there.  Again, I'm using the same procedure that
> >> I used on the example xml files, and they did the 'commit' at the end,
> >> so I'm doing something wrong.
> >>
> >> Is that all I need to do, define my fields in schema.xml and then
> >> import via post.jar?  It seems to work, but no results are ever found
> >> by solr.  I'm open to trying any debugging or whatever, I need to
> >> figure this out before I can start learning solr.
> >>
> >> Thanks
> >>
> >> P
> >
> >
>



-- 
Regards,
Shalin Shekhar Mangar.

solr admin page throwing errors

2009-01-09 Thread Yerraguntla


Hi,

I am using solr admin page with index.jsp from

< <%-- $Id: index.jsp 686780 2008-08-18 15:08:28Z yonik $ --%>

I am getting these errors. Any insight will be helpful.

HTTP Status 500 - javax.servlet.ServletException:
java.lang.NoSuchFieldError: config org.apache.jasper.JasperException:
javax.servlet.ServletException: java.lang.NoSuchFieldError: config at
org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:532)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:408)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:687)
at
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:469)
at
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:403)
at
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:301)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:228)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:216)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:634)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:445)
at java.lang.Thread.run(Thread.java:619) Caused by:
javax.servlet.ServletException: java.lang.NoSuchFieldError: config at
org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:855)
at
org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:784)
at org.apache.jsp.admin.index_jsp._jspService(index_jsp.java:324) at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:384)
... 22 more Caused 
-- 
View this message in context: 
http://www.nabble.com/solr-admin-page-throwing-errors-tp21375221p21375221.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deduplication patch not working in nightly build


hey there,
I hadn't autoCommit set to true but I have it sorted! The error stopped
appearing after setting the property maxBufferedDocs in solrconfig.xml. I
can't exactly undersand why but it just worked.
Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same?

Thanks


Marc Sturlese wrote:
> 
> Hey there,
> I was using the Deduplication patch with Solr 1.3 release and everything
> was working perfectly. Now I upgraded to a nigthly build (20th december)
> to be able to use new facet algorithm and other stuff and DeDuplication is
> not working any more. I have followed exactly the same steps to apply the
> patch to the source code. I am geting this error:
> 
> WARNING: Error reading data 
> com.mysql.jdbc.CommunicationsException: Communications link failure due to
> underlying exception: 
> 
> ** BEGIN NESTED EXCEPTION ** 
> 
> java.io.EOFException
> 
> STACKTRACE:
> 
> java.io.EOFException
> at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905)
> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
> at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
> at
> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
> at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
> at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225)
> at
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388)
> 
> 
> ** END NESTED EXCEPTION **
> Last packet sent to the server was 202481 ms ago.
> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
> at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
> at
> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
> at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
> at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225)
> at
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388)
> Jan 5, 2009 10:06:16 AM org.apache.solr.handler.dataimport.JdbcDataSource
> logError
> WARNING: Exception while closing result set
> com.mysql.jdbc.CommunicationsException: Communications link failure due to
> underlying exception: 
> 
> ** BEGIN NESTED EXCEPTION ** 
> 
> java.io.EOFException
> 
> STACKTRACE:
> 
> java.io.EOFException
> at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905)
> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2351)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
> at com.mysql.jdbc.MysqlIO.nextRow(My

Re: Overlapping Replication Scripts

2009-01-09 Thread Bill Au

You do a commit in step 1 after the update, right?  So if you configure Solr
on the indexer to invoke snapshooter after a commit and optimize, then you
would not need to invoke snapshooter explicitly using cron.  snappuller
doesn't do anything unless there is a new snapshot on the indexer.

Bill

On Fri, Jan 9, 2009 at 4:31 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Fri, Jan 9, 2009 at 4:28 AM, wojtekpia  wrote:
>
> >
> > What happens if I overlap the execution of my cron jobs? Do any of these
> > scripts detect that another instance is already executing?
>
>
> No, they don't.
>
>
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/Overlapping-Replication-Scripts-tp21362434p21362434.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Beginner: importing own data

2009-01-09 Thread phil cryer

Otis
Thanks for your reply, I wrote out a long email explaining the steps I
took, and the results, but it was returned by the Solr-user email
server stamped as spam.  I've put my note on pastebin, you can see it
here: http://pastebin.cryer.us/pastebin.php?show=m359e2e47

I'd appreciate any feedback, I know I'm close to getting this working,
just can't see what I'm missing.

Thank you

P

On Thu, Jan 8, 2009 at 4:19 PM, Otis Gospodnetic
 wrote:
> Phil,
>
> The easiest thing to do at this stage in Solr learning experience is to 
> restart Solr (servlet container) and redo the search.  Results shouls start 
> showing up then because this will effectively reopen the index.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: phil cryer 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, January 8, 2009 5:00:29 PM
>> Subject: Beginner: importing own data
>>
>> So I have Solr running, I've run through the tutorials online, can
>> import data from the example xml and see the results, so it works!
>> Now, I take some xml data I have, convert it over to the add / doc
>> type that the demo ones are, run it and find out which fields aren't
>> defined in schema.xml, I add them there until they're all there and I
>> can finally import my own xml into solr w/o error.  But, when I go to
>> query solr, it's not there.  Again, I'm using the same procedure that
>> I used on the example xml files, and they did the 'commit' at the end,
>> so I'm doing something wrong.
>>
>> Is that all I need to do, define my fields in schema.xml and then
>> import via post.jar?  It seems to work, but no results are ever found
>> by solr.  I'm open to trying any debugging or whatever, I need to
>> figure this out before I can start learning solr.
>>
>> Thanks
>>
>> P
>
>

Re: Beginner: importing own data

2009-01-09 Thread phil cryer

Paul
I have looked at those, but want to learn how to do the easy things
first - as I posted below I can import example data and then search
against it.  Data that I've tried to import seems to import, but I
can't search/find it, I want to know how to do this first, so if you
have any idea, I would appreciate it.

Thanks

P

On Thu, Jan 8, 2009 at 8:18 PM, Noble Paul നോബിള്‍ नोब्ळ्
 wrote:
> did you explore using SolrJ to index data?
> http://wiki.apache.org/solr/Solrj
>
> or DataImportHandler.
> http://wiki.apache.org/solr/DataImportHandler
>
> On Fri, Jan 9, 2009 at 3:49 AM, Otis Gospodnetic
>  wrote:
>> Phil,
>>
>> The easiest thing to do at this stage in Solr learning experience is to 
>> restart Solr (servlet container) and redo the search.  Results shouls start 
>> showing up then because this will effectively reopen the index.
>>
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>> From: phil cryer 
>>> To: solr-user@lucene.apache.org
>>> Sent: Thursday, January 8, 2009 5:00:29 PM
>>> Subject: Beginner: importing own data
>>>
>>> So I have Solr running, I've run through the tutorials online, can
>>> import data from the example xml and see the results, so it works!
>>> Now, I take some xml data I have, convert it over to the add / doc
>>> type that the demo ones are, run it and find out which fields aren't
>>> defined in schema.xml, I add them there until they're all there and I
>>> can finally import my own xml into solr w/o error.  But, when I go to
>>> query solr, it's not there.  Again, I'm using the same procedure that
>>> I used on the example xml files, and they did the 'commit' at the end,
>>> so I'm doing something wrong.
>>>
>>> Is that all I need to do, define my fields in schema.xml and then
>>> import via post.jar?  It seems to work, but no results are ever found
>>> by solr.  I'm open to trying any debugging or whatever, I need to
>>> figure this out before I can start learning solr.
>>>
>>> Thanks
>>>
>>> P
>>
>>
>
>
>
> --
> --Noble Paul
>

Re: Solr on a multiprocessor machine

2009-01-09 Thread Yonik Seeley

On Fri, Jan 9, 2009 at 12:18 AM, smock  wrote:
> In some ways I have a 'small index'  (~8 million documents at the moment).
> However, I have a lot of attributes (currently about 30, but I'm expecting
> that number to keep growing) and am interested in faceting across all of
> them for every search

OK, this is where you will become CPU bound (faceting on 30 fields).
But if you will have any search traffic at all, you are better off
going with non-distributed search on a single box over distributed on
a single box.

Distributed search needs to do more work than non-distributed for
faceting also (in the form of over-requesting and facet refinement
requests).  If you are interested in why this extra work needs to be
done, search form "refinement" in
https://issues.apache.org/jira/browse/SOLR-303

-Yonik

Re: Deduplication patch not working in nightly build

2009-01-09 Thread Mark Miller

I can't imagine why dedupe would have anything to do with this, other 
than what was said, it perhaps is taking a bit longer to get a document 
to the db, and it times out (maybe a long signature calculation?). Have 
you tried changing your MySql settings to allow for a longer timeout? 
(sorry, I'm not to up to date on what you have tried).


Also, are you using autocommit during the import? If so, you might try 
turning it off for the full import.


- Mark

Marc Sturlese wrote:

Hey there,
I am stack in this problem sine 3 days ago and no idea how to sort it.

I am using the nighlty from a week ago, mysql and this driver and url:
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/my_db"

I can use deduplication patch with indexs of 200.000 docs and no problem.
When I try a full-import with a db of 1.500.000 it stops indexing at doc
number 15.000 aprox showing me the error posted above.
Once I get the exception, i restart tomcat and start a delta-import... this
time everything works fine!
I need to avoid this error in the full import, i have tryed:

url="jdbc:mysql://localhost/my_db?autoReconnect=true to sort it in case the
connection was closed due to long time until next doc was indexed, but
nothing changed... I keep having this:
Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource
logError
WARNING: Error reading data 
com.mysql.jdbc.CommunicationsException: Communications link failure due to
underlying exception: 

** BEGIN NESTED EXCEPTION ** 


java.io.EOFException

STACKTRACE:

java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428)


** END NESTED EXCEPTION **



Last packet sent to the server was 206097 ms ago.
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428)
Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource
logError
WARNING: Exception while closing result set
com.mysql.jdbc.CommunicationsExcepti

Re: Query regarding Spelling Suggestions

2009-01-09 Thread Grant Ingersoll

Can you put the full log (as short as possibly demonstrates the  
problem) somewhere where I can take a look?  Likewise, can you share  
your schema?


Also, does the spelling index exist under /data/index?  If  
you open it w/ Luke, does it have entries?


Thanks,
Grant

On Jan 8, 2009, at 11:30 PM, Deshpande, Mukta wrote:



Yes. I send the build command as:
http://localhost:8080/solr/select/?q=documnet&spellcheck=true&spellcheck
.build 
=true&spellcheck.count=2&spellcheck.q=parfect&spellcheck.dictionar

y=dict

The Tomcat log shows:
Jan 9, 2009 9:55:19 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select/
params 
={spellcheck=true&q=documnet&spellcheck.q=parfect&spellcheck.dicti

onary=dict&spellcheck.count=2&spellcheck.build=true} hits=0 status=0
QTime=141

Even after sending the build command I do not get any suggestions.
Can you please check.

Thanks,
~Mukta

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Thursday, January 08, 2009 7:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding Spelling Suggestions

Did you send in the build command?  See
http://wiki.apache.org/solr/SpellCheckComponent

On Jan 8, 2009, at 5:14 AM, Deshpande, Mukta wrote:


Hi,

I am using Wordnet dictionary for spelling suggestions.

The dictionary is converted to Solr index  with only one field "word"
and stored in location /data/syn_index, using
syns2Index.java program available at
http://www.tropo.com/techno/java/lucene/wordnet.html

I have added the "word" field in my "schema.xml" as name="word"



type="textSpell" indexed="true" stored="true"/>

My application data indexes are in /data

I am trying to use solr.IndexBasedSpellChecker to get spelling
suggestions.

My spell check component is configured as:


 textSpell
 
dict
solr.IndexBasedSpellChecker
word
UTF-8
./syn_index
 


I have added this component to my standard request handler as:


   
   explicit
   
   
   spellcheck
   


With the above configuration, I do not get any spelling suggestions.
Can
somebody help ASAP.

Thanks,
~Mukta


--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

RE: Ensuring documents indexed by autocommit

Thanks again for your inputs.
But then I am still stuck on the question that how do we ensure that
document is successfully indexed. One option I see is search for every
document sent to solr. Or do we assume that autocommit always indexes
all the documents successfully?

Thanks,
Siddharth

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Friday, January 09, 2009 5:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Ensuring documents indexed by autocommit

On Fri, Jan 9, 2009 at 5:00 PM, Alexander Ramos Jardim <
alexander.ramos.jar...@gmail.com> wrote:

> Shalin,
>
> Just to remember that with he is indexing more documents that he has 
> memory avaiable, it is a good thing to have autocommit set.

Yes, sorry, I had assumed that he has enough memory on the solr server.
If not, then autoCommit may improve performance.

Thanks for pointing this out Alexander.

--
Regards,
Shalin Shekhar Mangar.

Re: Deduplication patch not working in nightly build


Hey there,
I am stack in this problem sine 3 days ago and no idea how to sort it.

I am using the nighlty from a week ago, mysql and this driver and url:
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/my_db"

I can use deduplication patch with indexs of 200.000 docs and no problem.
When I try a full-import with a db of 1.500.000 it stops indexing at doc
number 15.000 aprox showing me the error posted above.
Once I get the exception, i restart tomcat and start a delta-import... this
time everything works fine!
I need to avoid this error in the full import, i have tryed:

url="jdbc:mysql://localhost/my_db?autoReconnect=true to sort it in case the
connection was closed due to long time until next doc was indexed, but
nothing changed... I keep having this:
Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource
logError
WARNING: Error reading data 
com.mysql.jdbc.CommunicationsException: Communications link failure due to
underlying exception: 

** BEGIN NESTED EXCEPTION ** 

java.io.EOFException

STACKTRACE:

java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428)


** END NESTED EXCEPTION **



Last packet sent to the server was 206097 ms ago.
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428)
Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource
logError
WARNING: Exception while closing result set
com.mysql.jdbc.CommunicationsException: Communications link failure due to
underlying exception: 

** BEGIN NESTED EXCEPTION ** 

java.io.EOFException

STACKTRACE:

java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2351)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)

Re: Problem in Out Put of Search

2009-01-09 Thread Alexander Ramos Jardim

Can you sample us the query you are doing and how you are indexing
documents?

2009/1/9 rohit arora 

>
> Hi,
>
> I have add one document only single time but the out put provided by lucene
> give me
> the same document multiple times..
>
> If i specify rows=2 in out put same document will be 2 times.
> If i specify rows=10 in out put same document will be 10 times.
>
> I have already defined 'id' field as a uniqueKey in the schema.xml
>
> with regards
>  Rohit Arora
>
> --- On Fri, 1/9/09, Shalin Shekhar Mangar  wrote:
> From: Shalin Shekhar Mangar 
> Subject: Re: Problem in Out Put of Search
> To: solr-user@lucene.apache.org
> Date: Friday, January 9, 2009, 11:55 AM
>
> There are two documents in that response. Are you adding the same document
> multiple times to Solr?
>
> You can also specify a uniqueKey in the schema.xml which will make sure
> that
> Solr keeps only one document for a given key and removes the duplicate
> documents.
>
> In the response you have pasted, the 'id' field looks like it should
> have
> been defined as a uniqueKey.
>
> On Fri, Jan 9, 2009 at 11:12 AM, rohit arora
> wrote:
>
> >
> > Hi,
> >
> > It gives this out put ..
> >
> > 
> > 5.361002
> > 8232
> > Quality Testing
> International
> > 
> > Quality Testing International the ideal exhibition for measuring
> technique
> > testing of materials and quality assurance. Profile for exhibit include
> > Customer profiling; customer marketing; loyalty systems and operators;
> > customer intelligence; market research and analysis; customer experience
> > management; employee motivation and incentivising; data warehousing/data
> > mining; employee training; contact/call centre; customer service
> management;
> > sales promotions and incentives; field marketing; CRM solutions.
> > 
> > 
> > Quality Testing International the ideal exhibition for measuring
> technique
> > testing of materials and quality assurance.
> > 
> > 
> > 
> > 5.361002
> > 8232
> > Quality Testing
> International
> > 
> > Quality Testing International the ideal exhibition for measuring
> technique
> > testing of materials and quality assurance. Profile for exhibit include
> > Customer profiling; customer marketing; loyalty systems and operators;
> > customer intelligence; market research and analysis; customer experience
> > management; employee motivation and incentivising; data warehousing/data
> > mining; employee training; contact/call centre; customer service
> management;
> > sales promotions and incentives; field marketing; CRM solutions.
> > 
> > 
> > Quality Testing International the ideal exhibition for measuring
> technique
> > testing of materials and quality assurance.
> > 
> > 
> >
> >
> > If you look it provide same record of (id,name,large_desc,small_desc)
> > multiple times..
> >
> > I have attached the out put in a (.txt) file..
> >
> > with regards
> >  Rohit Arora
> >
> >
> >
> >
> >
> > --- On *Thu, 1/8/09, Erik Hatcher *
> wrote:
> >
> > From: Erik Hatcher 
> > Subject: Re: Problem in Out Put of Search
> > To: solr-user@lucene.apache.org
> > Date: Thursday, January 8, 2009, 7:10 PM
> >
> >
> > Please provide an example of what you mean.
> >  What and how did you index?   What
> > was the query?
> >
> >   Erik
> >
> > On Jan 8, 2009, at 8:34 AM, rohit arora wrote:
> >
> > >
> > > Hi,
> > >
> > > I have installed solr lucene 1.3. I am facing a problem wile
> searching it
> > did not provides multiple records.
> > >
> > > Instead of providing multiple records it provides single record
> multiple
> > times..
> > >
> > > with regards
> > >  Rohit Arora
> > >
> > >
> > >
> >
> >
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
>
>
>
>



-- 
Alexander Ramos Jardim

Re: Ensuring documents indexed by autocommit

On Fri, Jan 9, 2009 at 5:00 PM, Alexander Ramos Jardim <
alexander.ramos.jar...@gmail.com> wrote:

> Shalin,
>
> Just to remember that with he is indexing more documents that he has memory
> avaiable, it is a good thing to have autocommit set.

Yes, sorry, I had assumed that he has enough memory on the solr server. If
not, then autoCommit may improve performance.

Thanks for pointing this out Alexander.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Ensuring documents indexed by autocommit

2009-01-09 Thread Alexander Ramos Jardim

Shalin,

Just to remember that with he is indexing more documents that he has memory
avaiable, it is a good thing to have autocommit set.

2009/1/9 Shalin Shekhar Mangar 

> On Fri, Jan 9, 2009 at 4:47 PM, Gargate, Siddharth 
> wrote:
>
> > But what you were suggesting is that I
> > should call commit only after some time or after few number of
> > documents, right?
>
>
> Correct. If you are using Solrj client for indexing data, you can use the
> SolrServer#add(Collection docs) method to add multiple
> documents in a batch and then call commit.
>
> But unless you really need to commit in between adding documents,
> committing
> at the very end of the indexing process usually gives the best performance.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Alexander Ramos Jardim

Re: Ensuring documents indexed by autocommit

On Fri, Jan 9, 2009 at 4:47 PM, Gargate, Siddharth  wrote:

> But what you were suggesting is that I
> should call commit only after some time or after few number of
> documents, right?

Correct. If you are using Solrj client for indexing data, you can use the
SolrServer#add(Collection docs) method to add multiple
documents in a batch and then call commit.

But unless you really need to commit in between adding documents, committing
at the very end of the indexing process usually gives the best performance.

-- 
Regards,
Shalin Shekhar Mangar.

RE: Ensuring documents indexed by autocommit

Sorry, for the previous question. What I meant was whether we can set
the configuration from the code. But what you were suggesting is that I
should call commit only after some time or after few number of
documents, right?

-Original Message-
From: Gargate, Siddharth [mailto:sgarg...@ptc.com] 
Sent: Friday, January 09, 2009 4:43 PM
To: solr-user@lucene.apache.org
Subject: RE: Ensuring documents indexed by autocommit

 How do we set the maxDocs or maxTime for commit from the application?

Thanks,
Siddharth

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Friday, January 09, 2009 4:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Ensuring documents indexed by autocommit

On Fri, Jan 9, 2009 at 4:20 PM, Gargate, Siddharth 
wrote:

> Thanks Shalin for the reply.
> I am working with the remote Solr server. I am using autocommit 
> instead of commit method call because I observed significant 
> performance improvement with autocommit.
> Just wanted to make sure that callback functionality is currently not 
> available in Solr.
>
>
You provide your own implementation of SolrEventListener to do a call
back to your application in any way you need.

I don't think using autoCommit gives a performance advantage over normal
commits. Calling commit after each document is not a good idea since
commit is an expensive operation. The only reason you are seeing better
performance after autoCommit is because it is set to commit after 'X'
number of documents or minutes. This is something you can do from your
application as well.

--
Regards,
Shalin Shekhar Mangar.

RE: Ensuring documents indexed by autocommit

 How do we set the maxDocs or maxTime for commit from the application?

Thanks,
Siddharth

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Friday, January 09, 2009 4:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Ensuring documents indexed by autocommit

On Fri, Jan 9, 2009 at 4:20 PM, Gargate, Siddharth 
wrote:

> Thanks Shalin for the reply.
> I am working with the remote Solr server. I am using autocommit 
> instead of commit method call because I observed significant 
> performance improvement with autocommit.
> Just wanted to make sure that callback functionality is currently not 
> available in Solr.
>
>
You provide your own implementation of SolrEventListener to do a call
back to your application in any way you need.

I don't think using autoCommit gives a performance advantage over normal
commits. Calling commit after each document is not a good idea since
commit is an expensive operation. The only reason you are seeing better
performance after autoCommit is because it is set to commit after 'X'
number of documents or minutes. This is something you can do from your
application as well.

--
Regards,
Shalin Shekhar Mangar.

Re: Querying based on term position possible?

2009-01-09 Thread Alexander Ramos Jardim

2009/1/8 Otis Gospodnetic 

> Hello Mark,
>


> As for assigning different weight to fields, have a look at DisMax request
> handler -
> http://wiki.apache.org/solr/DisMaxRequestHandler#head-af452050ee272a1c88e2ff89dc0012049e69e180
>

Field boosting should solve this issue too, right?


>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: Mark Tovey 
> > To: solr-user@lucene.apache.org
> > Sent: Thursday, January 8, 2009 12:16:39 PM
> > Subject: Querying based on term position possible?
> >
> > I'm a relative newbie at Solr/Lucene so apologies if this question is
> > overly simplistic. I have an index built and functioning as expected,
> > but I am trying to build a query that can sort/score results based on
> > the search terms position in the document, with a document appearing
> > higher in the results list if the term appears earlier in the document.
> > For example, "Red fox in the forest" would be scored over "My shoes are
> > red today and my shirt is also red" if I search for the term "red". It
> > seems to me that the default scoring algorithm is based more on the term
> > frequency than term position, though this may be a simplistic
> > interpretation. Does anyone on the list know if there is a way to
> > achieve my desired results by structuring a query a certain way, or is
> > this more of an indexing issue where I should have set a parameter(s) in
> > my schema to a certain value? Any help is hugely appreciated as I have
> > been puzzling away at this for the past couple of days with no success.
> >
> >
> >
> > Alternatively, is there a way to query on two fields for a search term
> > with documents being placed higher in the results if the term occurs in
> > field1 over field2? I ask this because one of the fields in my schema
> > (title in this case) is more deemed more important in our scenario than
> > the "text" field (which holds the title plus the contents of the
> > remainder of the document). I tried, for example, title:red text:red but
> > again was stumped on the syntax to place an "importance" variable on
> > field1 over field2.
> >
> >
> >
> > Of course, it may be that what I'm trying to accomplish is simply not
> > doable with the Lucene engine, at which point feel free to point out the
> > error of my ways ;)
> >
> >
> >
> > Regards,
> >
> > --Mark Tovey
>
>


-- 
Alexander Ramos Jardim

Re: Ensuring documents indexed by autocommit

On Fri, Jan 9, 2009 at 4:20 PM, Gargate, Siddharth  wrote:

> Thanks Shalin for the reply.
> I am working with the remote Solr server. I am using autocommit instead
> of commit method call because I observed significant performance
> improvement with autocommit.
> Just wanted to make sure that callback functionality is currently not
> available in Solr.
>
>
You provide your own implementation of SolrEventListener to do a call back
to your application in any way you need.

I don't think using autoCommit gives a performance advantage over normal
commits. Calling commit after each document is not a good idea since commit
is an expensive operation. The only reason you are seeing better performance
after autoCommit is because it is set to commit after 'X' number of
documents or minutes. This is something you can do from your application as
well.

-- 
Regards,
Shalin Shekhar Mangar.

RE: Ensuring documents indexed by autocommit

Thanks Shalin for the reply.
I am working with the remote Solr server. I am using autocommit instead
of commit method call because I observed significant performance
improvement with autocommit. 
Just wanted to make sure that callback functionality is currently not
available in Solr.

Thanks,
Siddharth 

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Friday, January 09, 2009 3:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Ensuring documents indexed by autocommit

On Fri, Jan 9, 2009 at 3:03 PM, Gargate, Siddharth 
wrote:

> Hi all,
>I am using CommonsHttpSolrServer to add documents to Solr. Instead 
> of explicitly calling commit for every document I have configured 
> autocommit in solrconfig.xml. But how do we ensure that the document 
> added is successfully indexed/committed on Solr side. Is there any 
> callback mechanism available where the callback method my application 
> will get called? I looked at the postCommit listener in solrconfig.xml

> file but looks like it just supports execution of external
executables.
>

Are you using embedded Solr? or is it on a remote machine? A callback
would only work on the same JVM anyway.

You can always call commit through CommonsHttpSolrServer and then do a
query to check if the document you expect got indexed. Though, if all
the add and commit calls were successful (i.e. returned HTTP 200), it is
very unlikely that the document won't be indexed.

--
Regards,
Shalin Shekhar Mangar.

Re: Solr on a multiprocessor machine



On Jan 9, 2009, at 12:28 AM, smock wrote:
I'm using 1.3 - are the nightly builds stable enough to use in  
production?


Testing always recommended, and no official guarantees are made of  
course, but trunk is vastly superior to 1.3 in faceting performance.   
I'd use trunk (in fact I am) in production.


Erik

Re: Problem in Out Put of Search


Rohit, I'd guess you don't have  set to id in schema.xml.

Erik

On Jan 9, 2009, at 1:57 AM, rohit arora wrote:



Hi,

I have add one document only single time but the out put provided by  
lucene give me

the same document multiple times..

If i specify rows=2 in out put same document will be 2 times.
If i specify rows=10 in out put same document will be 10 times.

I have already defined 'id' field as a uniqueKey in the schema.xml

with regards
 Rohit Arora

--- On Fri, 1/9/09, Shalin Shekhar Mangar   
wrote:

From: Shalin Shekhar Mangar 
Subject: Re: Problem in Out Put of Search
To: solr-user@lucene.apache.org
Date: Friday, January 9, 2009, 11:55 AM

There are two documents in that response. Are you adding the same  
document

multiple times to Solr?

You can also specify a uniqueKey in the schema.xml which will make  
sure that

Solr keeps only one document for a given key and removes the duplicate
documents.

In the response you have pasted, the 'id' field looks like it should
have
been defined as a uniqueKey.

On Fri, Jan 9, 2009 at 11:12 AM, rohit arora
wrote:



Hi,

It gives this out put ..


   5.361002
   8232
   Quality Testing

International

   
Quality Testing International the ideal exhibition for measuring  
technique
testing of materials and quality assurance. Profile for exhibit  
include
Customer profiling; customer marketing; loyalty systems and  
operators;
customer intelligence; market research and analysis; customer  
experience
management; employee motivation and incentivising; data warehousing/ 
data

mining; employee training; contact/call centre; customer service

management;

sales promotions and incentives; field marketing; CRM solutions.
   
   
Quality Testing International the ideal exhibition for measuring  
technique

testing of materials and quality assurance.
   


   5.361002
   8232
   Quality Testing

International

   
Quality Testing International the ideal exhibition for measuring  
technique
testing of materials and quality assurance. Profile for exhibit  
include
Customer profiling; customer marketing; loyalty systems and  
operators;
customer intelligence; market research and analysis; customer  
experience
management; employee motivation and incentivising; data warehousing/ 
data

mining; employee training; contact/call centre; customer service

management;

sales promotions and incentives; field marketing; CRM solutions.
   
   
Quality Testing International the ideal exhibition for measuring  
technique

testing of materials and quality assurance.
   



If you look it provide same record of (id,name,large_desc,small_desc)
multiple times..

I have attached the out put in a (.txt) file..

with regards
Rohit Arora





--- On *Thu, 1/8/09, Erik Hatcher *

wrote:


From: Erik Hatcher 
Subject: Re: Problem in Out Put of Search
To: solr-user@lucene.apache.org
Date: Thursday, January 8, 2009, 7:10 PM


Please provide an example of what you mean.
What and how did you index?   What
was the query?

Erik

On Jan 8, 2009, at 8:34 AM, rohit arora wrote:



Hi,

I have installed solr lucene 1.3. I am facing a problem wile

searching it

did not provides multiple records.


Instead of providing multiple records it provides single record

multiple

times..


with regards
Rohit Arora










--
Regards,
Shalin Shekhar Mangar.

Re: Ensuring documents indexed by autocommit

On Fri, Jan 9, 2009 at 3:03 PM, Gargate, Siddharth  wrote:

> Hi all,
>I am using CommonsHttpSolrServer to add documents to Solr. Instead
> of explicitly calling commit for every document I have configured
> autocommit in solrconfig.xml. But how do we ensure that the document
> added is successfully indexed/committed on Solr side. Is there any
> callback mechanism available where the callback method my application
> will get called? I looked at the postCommit listener in solrconfig.xml
> file but looks like it just supports execution of external executables.
>

Are you using embedded Solr? or is it on a remote machine? A callback would
only work on the same JVM anyway.

You can always call commit through CommonsHttpSolrServer and then do a query
to check if the document you expect got indexed. Though, if all the add and
commit calls were successful (i.e. returned HTTP 200), it is very unlikely
that the document won't be indexed.

-- 
Regards,
Shalin Shekhar Mangar.

Re: 2 questions about solr spellcheck

On Fri, Jan 9, 2009 at 12:59 AM, Qingdi  wrote:

>
> Hi,
>
> I use solr 1.3 and I have two questions about spellcheck.
>
> 1) if my index docs are like:
> 
> university1
> UNIVERSITY
> 
> 
> street1, city1
> LOCATION
> 
> is it possible to build the spell check dictionary using field "NAME" but
> with filter "TYPE"="UNIVERSITY"?
> That is, I only want to include the university name in the dictionary. What
> is the proper way to implement this?
>

It is not possible out of the box. However, there are a couple of ways to do
this.

1. You can create a copy field for 'NAME' (say 'NAME_SPELL') which has a
value only if "TYPE"="UNIVERSITY" for the document.
2. You can create your own implementation of the IndexBasedSpellChecker and
HighFrequencyDictionary which applies a filter query on "TYPE" and then uses
the terms to create the dictionary.

Option #1 would be probably be the easiest if you care only about
"TYPE"="UNIVERSITY".

> 2) my current data index size is about 11G, and the spelling dictionary
> index size is about 6 G. After adding the spell check component, will the
> spell checking have any impact on the runtime query performance and memory
> usage? Should I increase the memory allocation for the solr server?
>

I think the spelling index will have some impact. But the magnitude of the
impact and the memory needed depends on a number of factors such as type of
queries, query rate etc.

>
> Thanks for your help.
>
> Qingdi
> --
> View this message in context:
> http://www.nabble.com/2-questions-about-solr-spellcheck-tp21359183p21359183.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

-- 
Regards,
Shalin Shekhar Mangar.

Ensuring documents indexed by autocommit

Hi all,
I am using CommonsHttpSolrServer to add documents to Solr. Instead
of explicitly calling commit for every document I have configured
autocommit in solrconfig.xml. But how do we ensure that the document
added is successfully indexed/committed on Solr side. Is there any
callback mechanism available where the callback method my application
will get called? I looked at the postCommit listener in solrconfig.xml
file but looks like it just supports execution of external executables.
 
Thanks in advance,
Siddharth

Re: Overlapping Replication Scripts

On Fri, Jan 9, 2009 at 4:28 AM, wojtekpia  wrote:

>
> What happens if I overlap the execution of my cron jobs? Do any of these
> scripts detect that another instance is already executing?


No, they don't.


>
> --
> View this message in context:
> http://www.nabble.com/Overlapping-Replication-Scripts-tp21362434p21362434.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Problem in Out Put of Search