Re: filter query from external list of Solr unique IDs

2010-10-15 Thread Chris Hostetter

: Hoss  mentioned a couple of ideas:
: 1) sub-classing query parser
: 2) Having the app query a database and somehow passing something 
: to Solr or lucene for the filter query

The approach i was refering to is something one of my coworkers did a 
while back (if he's still lurking on the list, maybe he'll speak up)

He implemented a custom "SqlFilterQuery" class that was constructed from a 
JDBC URL and a SQL statement.  the SqlQuery class rewrote to itself (so it 
was a primitive query class) and returned a Scorer method that would:

1) execute the SQL query (which should return a sorted list of uniqueKey 
field values) and retrieve a JDBC iterator (cursor?) over the results.
2) fetch a TermEnum from Lucene for the uniqueKey field
3) use the JDBC Iterator to skip ahead on the TermEnum and for each 
uniqueKey to get the underlying lucene docid, and record it in a DocSet

As i recall, my coworker was using this in a custom RequestHandler, where 
he was then forcibly putting that DocSet in the filterCache so that it 
would be there on future requests, and it would be regenerated by 
autoWarming (the advantage of implementing this logic using the Query 
interface) but it could also be done with a custom cache if you don't want 
these to contend for space in the filterCache.

My point aout hte query parser was that instead of needing to use a custom 
RequestHandler (or even a custom SearchCOmponent) to generate this DocSet 
for filtering, you could probably do it using a QParserPlugin -- that way 
you could use a regaulr "fq" param to generate the filter.  You could even 
generalize the hell out of it so the SQL itself could be specified at 
request time...

  q=solr&fq={!sql}SELECT ID FROM USER_MAP WHERE USER=1234 ORDER BY ID ASC



-Hoss


Re: "Virtual field", Statistics

2010-10-15 Thread Lance Norskog
Please add a JIRA issue requesting this. A bunch of things are not
supported for functions: returning as a field value, for example.

On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal  wrote:
> Dear solr-user folks,
>
> I would like to use the stats module to perform very basic statistics
> (mean, min and max) which is actually working just fine.
>
> Nethertheless I found a little limitation that bothers me a tiny bit :
> how to perform the exact same statistics, but on the result of a
> function query rather than a field.
>
> Example :
> schema :
> - string : id
> - float : width
> - float : height
> - float : depth
> - string : color
> - float : price
>
> What I'd like to do is something like :
> select?price:[45.5 TO
> 99.99]&stats=on&stats.facet=color&stats.field={volume=product(product(width,
> height), depth)}
> I would expect to obtain :
>
> 
>  
>  
>   ...
>   ...
>   ...
>   ...
>   ...
>   ...
>   ...
>   ...
>   
>    
>     
>      ...
>      ...
>      ...
>      ...
>      ...
>      ...
>      ...
>      ...
>    
>    
>      ...
>      ...
>      ...
>      ...
>      ...
>      ...
>      ...
>      ...
>    
>    
>   
>  
>  
> 
>
> Of course computing the volume can be performed before indexing data,
> but defining virtual fields on the fly given an arbitrary function is
> powerful and I am comfortable with the idea that many others would
> appreciate. Especially for BI needs and so on... :-D
> Is there a way to do it easily that I would have not been able to
> find, or is it actually impossible ?
>
> Thank you very much in advance for your help.
>
> --
> Tanguy
>



-- 
Lance Norskog
goks...@gmail.com


Re: Synchronizing Solr with a PostgreDB

2010-10-15 Thread Dennis Gearon
We're doing what was recommended. Nice to hear we're on the right path.

Yeah Postgres!
Yeah Solr/Lucene!

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Fri, 10/15/10, Juan Manuel Alvarez  wrote:

> From: Juan Manuel Alvarez 
> Subject: Re: Synchronizing Solr with a PostgreDB
> To: solr-user@lucene.apache.org
> Date: Friday, October 15, 2010, 1:04 PM
> Thanks for the quick response! =o)
> We will go with that approach.
> 
> On Thu, Oct 14, 2010 at 7:19 PM, Allistair Crossley 
> wrote:
> > i would not cross-reference solr results with your
> database to merge unless you want to spank your database.
> nor would i load solr with all your data. what i have found
> is that the search results page is generally a small subset
> of data relating to the fuller document/result. therefore i
> store only the data required to present the search results
> wholly from solr. the user can choose to click into a
> specific result which then uses just the database to present
> it.
> >
> > use data import handler - define an xml config to
> import as many entities into your document as you need and
> map columns to fields in schema.xml. use the Wiki page on
> DIH - it's all there, as well as example config in the solr
> distro.
> >
> > allistair
> >
> > On Oct 14, 2010, at 6:13 PM, Juan Manuel Alvarez
> wrote:
> >
> >> Hello everyone! I am new to Solr and Lucene and I
> would like to ask
> >> you a couple of questions.
> >>
> >> I am working on an existing system that has the
> data saved in a
> >> Postgre DB and now I am trying to integrate Solr
> to use full-text
> >> search and faceted search, but I am having a
> couple of doubts about
> >> it.
> >>
> >> 1) I see two ways of storing the data and make the
> search:
> >> - Duplicate all the DB data in Solr, so complete
> results are returned
> >> from a search query, or...
> >> - Put in Solr just the data that I need to search
> and, after finding
> >> the elements with a Solr query, use the result to
> make a more specific
> >> query to the DB.
> >>
> >> Which is the way this is normally done?
> >>
> >> 2) How do I synchronize Solr and Postgre? Do I
> have to use the
> >> DataImportHandler or when I do the INSERT command
> into Postgre, I have
> >> to execute a command into Solr?
> >>
> >> Thanks for your time!
> >>
> >> Cheers!
> >> Juan M.
> >
> >
>


SOLR DateTime and SortableLongField field type problems

2010-10-15 Thread Ken Stanley
Hello all,

I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow
the advice from
http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.html about
converting date fields to SortableLong fields for better memory efficiency.
However, whenever I try to do this using the DateFormater, I get exceptions
when indexing for every row that tries to create my sortable fields.

In my schema.xml, I have the following definitions for the fieldType and
dynamicField:




In my dih.xml, I have the following definitions:


















The fields in question are in the formats:




2001-12-04T00:00:00Z


2001-12-04T19:38:01Z




The exception that I am receiving is:

Oct 15, 2010 6:23:24 PM
org.apache.solr.handler.dataimport.DateFormatTransformer transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: "Wed Nov 28 21:39:05 EST 2007"
at java.text.DateFormat.parse(DateFormat.java:337)
at
org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89)
at
org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

I know that it has to be the SortableLong fields, because if I remove just
those two lines from my dih.xml, everything imports as I expect it to. Am I
doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is
this not supported in my version of SOLR? I'm not very experienced with
Java, so digging into the code would be a lost cause for me right now. I was
hoping that somebody here might be able to help point me in the
right/correct direction.

It should be noted that the modified_date and df_date_published fields index
just fine (so long as I do it as I've defined above).

Thank you,

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
Thanks Yonik,

Is this something you might have time to throw together, or an outline of what 
needs to be thrown together?
Is this something that should be asked on the developer's list or discussed in 
SOLR 1715 or does it make the most sense to keep the discussion in this thread?

Tom

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, October 15, 2010 1:19 PM
To: solr-user@lucene.apache.org
Subject: Re: filter query from external list of Solr unique IDs

On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom  wrote:
> At the Lucene Revolution conference I asked about efficiently building a 
> filter query from an external list of Solr unique ids.
Yeah, I've thought about a special query parser and query to deal with
this (relatively) efficiently, both from a query perspective and a
memory perspective.

Should be pretty quick to throw together:
- comma separated list of terms (unique ids are a special case of this)
- in the query, store as a single byte array for efficiency
- sort the ids if they aren't already sorted
- do lookups with a term enumerator and skip weighting or anything
else like that
- configurable caching... may, or may not want to cache this big query

That's only part of the stuff you mention, but seems like it would be
useful to a number of people.

-Yonik
http://www.lucidimagination.com


Re: Disable (or prohibit) per-field overrides

2010-10-15 Thread Chris Hostetter

: Anyone knows useful method to disable or prohibit the per-field override 
: features for the search components? If not, where to start to make it 
: configurable via solrconfig and attempt to come up with a working patch?

If your goal is to prevent *clients* from specifying these (while you're 
still allowed to use them in your defaults) then the simplest solution is 
probably something external to Solr -- along the lines of mod_rewrite.

Internally...

that would be tough.

You could probably write a SearchComponent (configured to run "first") 
that does it fairly easily -- just wrap the SolrParams in an impl that 
retuns null anytime a component asks for a param name that starts with 
"f." (and excludes those param names when asked for a list of the param 
names) 


It could probably be generalized to support arbitrary rules i na way 
that might be handy for other folks, but it would still just be 
wrapping all of hte params, so it would prevent you from using them 
in your config as well.

Ultimatley i think a general solution would need to be in 
RequestHandlerBase ... where it wraps the request params using the 
defaults and invariants ... you'd want the custom exclusion rules to apply 
only to the request params from the client.




-Hoss


Re: Question related to phrase search in lucene/solr?

2010-10-15 Thread Chris Hostetter

: I have question is it possible to perform a phrase search with wild  cards  
in 
: solr/lucene as if i have two queries both have exactly same  results one is
: +Contents:"change market"
: 
: and other is 
: +Contents:"chnage* market"
: 
: but i think the second should match "chages market" as well but it does not 
: matches it. Any help would be appreciated

In my experience, 90% of the times people ask about using wildcards in a 
phrase query what they really want is simple stemming of the terms -- the 
one example you've cited is an example of this.  If your "Contents" field 
uses an analyzer that does stemming then "change market" and "changes 
market" would both match.



-Hoss


Re: having problem about Solr Date Field.

2010-10-15 Thread Chris Hostetter

: So, regarding DST, do you put everything in GMT, and make adjustments 
: for in the 'seach for/between' data/time values before the query for 
: both DST and TZ?

The client adding docs is hte only one that knows what TZ it's in when it 
formats the docs to add them, and the client issuing the query is hte 
only one that knows what TZ it's in when it formats the query string to 
execute the query.  in both cases the client must use the UTC TZ when 
formating the date strings so that Solr can deal with it correctly.


-Hoss


Re: ant build problem

2010-10-15 Thread Chris Hostetter

: i updated my solr trunk to revision 1004527. when i go for compiling
: the trunk with ant i get so many warnings, but the build is successful. the

Most of these warnings are legitimate, the probelms have always been 
there, but recently the Lucene build file was updated to warn about them 
by default.

This one though...
: [javac] warning: [path] bad path element
: "/usr/share/ant/lib/hamcrest-core.jar": no such file or directory

...thta's something specific to your setup.  something in your systems ant 
configs thinks thta jar should be there.

: After the compiling i thought to check with the ant test and performed but
: it is failed..

failing tests are also a posisbility ... there are several tests in hte 
code base right now that fail sporadicly (especially because of recent 
changes ot hte build system designed to get test that *might* fail 
based on locale to fail more often) and people are working on them -- 
w/o full details about wat failurs you got though, we can't say if they 
are known issues.


-Hoss


Re: Solr with example Jetty and score problem

2010-10-15 Thread Chris Hostetter

: Thanks. But do you have any suggest or work-around to deal with it?

Posted in SOLR-2140



..this key is to make sure solr knows "score" is not multiValued


-Hoss


Re: SOLRJ - Searching text in all fields of a Bean

2010-10-15 Thread Ahmet Arslan
You can replace query.setQueryType("dismax") with query.set("defType", 
"dismax");

Also don't forget to request title field with fl parameter. 
query.addField("title");



  

Re: Synchronizing Solr with a PostgreDB

2010-10-15 Thread Juan Manuel Alvarez
Thanks for the quick response! =o)
We will go with that approach.

On Thu, Oct 14, 2010 at 7:19 PM, Allistair Crossley  wrote:
> i would not cross-reference solr results with your database to merge unless 
> you want to spank your database. nor would i load solr with all your data. 
> what i have found is that the search results page is generally a small subset 
> of data relating to the fuller document/result. therefore i store only the 
> data required to present the search results wholly from solr. the user can 
> choose to click into a specific result which then uses just the database to 
> present it.
>
> use data import handler - define an xml config to import as many entities 
> into your document as you need and map columns to fields in schema.xml. use 
> the Wiki page on DIH - it's all there, as well as example config in the solr 
> distro.
>
> allistair
>
> On Oct 14, 2010, at 6:13 PM, Juan Manuel Alvarez wrote:
>
>> Hello everyone! I am new to Solr and Lucene and I would like to ask
>> you a couple of questions.
>>
>> I am working on an existing system that has the data saved in a
>> Postgre DB and now I am trying to integrate Solr to use full-text
>> search and faceted search, but I am having a couple of doubts about
>> it.
>>
>> 1) I see two ways of storing the data and make the search:
>> - Duplicate all the DB data in Solr, so complete results are returned
>> from a search query, or...
>> - Put in Solr just the data that I need to search and, after finding
>> the elements with a Solr query, use the result to make a more specific
>> query to the DB.
>>
>> Which is the way this is normally done?
>>
>> 2) How do I synchronize Solr and Postgre? Do I have to use the
>> DataImportHandler or when I do the INSERT command into Postgre, I have
>> to execute a command into Solr?
>>
>> Thanks for your time!
>>
>> Cheers!
>> Juan M.
>
>


Re: facet.field :java.lang.NullPointerException

2010-10-15 Thread Yonik Seeley
This is https://issues.apache.org/jira/browse/SOLR-2142
I'll look into it soon.
-Yonik
http://www.lucidimagination.com



On Fri, Oct 15, 2010 at 3:12 PM, Pradeep Singh  wrote:
> Faceting blows up when the field has no data. And this seems to be random.
> Sometimes it will work even with no data, other times not. Sometimes the
> error goes away if the field is set to multiValued=true (even though it's
> one value every time), other times it doesn't. In all cases setting
> facet.method to enum takes care of the problem. If this param is not set,
> the default leads to null pointer exception.
>
>
> 09:18:52,218 SEVERE [SolrCore] Exception during facet.field of
> xyz:java.lang.NullPointerException
>
>      at java.lang.System.arraycopy(Native Method)
>
>      at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247)
>
>      at
> org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164)
>
>      at
> org.apache.solr.request.NumberedTermsEnum.(UnInvertedField.java:960)
>
>      at
> org.apache.solr.request.TermIndex$1.(UnInvertedField.java:1151)
>
>      at
> org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151)
>
>      at
> org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204)
>
>      at
> org.apache.solr.request.UnInvertedField.(UnInvertedField.java:188)
>
>      at
> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911)
>
>      at
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298)
>
>      at
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354)
>
>      at
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190)
>
>      at
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
>
>      at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)
>
>      at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>
>      at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
>
>      at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
>
>      at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
>                at
>


facet.field :java.lang.NullPointerException

2010-10-15 Thread Pradeep Singh
Faceting blows up when the field has no data. And this seems to be random.
Sometimes it will work even with no data, other times not. Sometimes the
error goes away if the field is set to multiValued=true (even though it's
one value every time), other times it doesn't. In all cases setting
facet.method to enum takes care of the problem. If this param is not set,
the default leads to null pointer exception.


09:18:52,218 SEVERE [SolrCore] Exception during facet.field of
xyz:java.lang.NullPointerException

  at java.lang.System.arraycopy(Native Method)

  at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247)

  at
org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164)

  at
org.apache.solr.request.NumberedTermsEnum.(UnInvertedField.java:960)

  at
org.apache.solr.request.TermIndex$1.(UnInvertedField.java:1151)

  at
org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151)

  at
org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204)

  at
org.apache.solr.request.UnInvertedField.(UnInvertedField.java:188)

  at
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911)

  at
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298)

  at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354)

  at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190)

  at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)

  at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)

  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)

  at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)

  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at


RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
Hi Jonathan,

The advantages of the obvious approach you outline are that it is simple, it 
fits in to the existing Solr model, it doesn't require any customization or 
modification to Solr/Lucene java code.  Unfortunately, it does not scale well.  
We originally tried just what you suggest for our implementation of Collection 
Builder.  For a user's personal collection we had a table that maps the 
collection id to the unique Solr ids.
Then when they wanted to search their collection, we just took their search and 
added a filter query with the fq=(id:1 OR id:2 OR).   I seem to remember 
running in to a limit on the number of OR clauses allowed. Even if you can set 
that limit larger, there are a  number of efficiency issues.  

We ended up constructing a separate Solr index where we have a multi-valued 
collection number field. Unfortunately, until incremental field updating gets 
implemented, this means that every time someone adds a document to a 
collection, the entire document (including 700KB of OCR) needs to be re-indexed 
just to update the collection number field. This approach has allowed us to 
scale up to a total of something under 100,000 documents, but we don't think we 
can scale it much beyond that for various reasons.

I was actually thinking of some kind of custom Lucene/Solr component that would 
for example take a query parameter such as &lookitUp=123 and the component 
might do a JDBC query against a database or kv store and return results in some 
form that would be efficient for Solr/Lucene to process. (Of course this 
assumes that a JDBC query would be more efficient than just sending a long list 
of ids to Solr).  The other part of the equation is mapping the unique Solr ids 
to internal Lucene ids in order to implement a filter query.   I was wondering 
if something like the unique id to Lucene id mapper in zoie might be useful or 
if that is too specific to zoie. SoThis may be totally off-base, since I 
haven't looked at the zoie code at all yet.

In our particular use case, we might be able to build some kind of in-memory 
map after we optimize an index and before we mount it in production. In our 
workflow, we update the index and optimize it before we release it and once it 
is released to production there is no indexing/merging taking place on the 
production index (so the internal Lucene ids don't change.)  

Tom



-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Friday, October 15, 2010 1:07 PM
To: solr-user@lucene.apache.org
Subject: RE: filter query from external list of Solr unique IDs

Definitely interested in this. 

The naive obvious approach would be just putting all the ID's in the query. 
Like fq=(id:1 OR id:2 OR).  Or making it another clause in the 'q'.  

Can you outline what's wrong with this approach, to make it more clear what's 
needed in a solution?



RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Demian Katz
The main problem I've encountered with the "lots of OR clauses" approach is 
that you eventually hit the limit on Boolean clauses and the whole query fails. 
 You can keep raising the limit through the Solr configuration, but there's 
still a ceiling eventually.

- Demian

> -Original Message-
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Friday, October 15, 2010 1:07 PM
> To: solr-user@lucene.apache.org
> Subject: RE: filter query from external list of Solr unique IDs
> 
> Definitely interested in this.
> 
> The naive obvious approach would be just putting all the ID's in the
> query. Like fq=(id:1 OR id:2 OR).  Or making it another clause in
> the 'q'.
> 
> Can you outline what's wrong with this approach, to make it more clear
> what's needed in a solution?
> 
> From: Burton-West, Tom [tburt...@umich.edu]
> Sent: Friday, October 15, 2010 11:49 AM
> To: solr-user@lucene.apache.org
> Subject: filter query from external list of Solr unique IDs
> 
> At the Lucene Revolution conference I asked about efficiently building
> a filter query from an external list of Solr unique ids.
> 
> Some use cases I can think of are:
> 1)  personal sub-collections (in our case a user can create a small
> subset of our 6.5 million doc collection and then run filter queries
> against it)
> 2)  tagging documents
> 3)  access control lists
> 4)  anything that needs complex relational joins
> 5)  a sort of alternative to incremental field updating (i.e.
> update in an external database or kv store)
> 6)  Grant's clustering cluster points and similar apps.
> 
> Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't
> seem to be any work on it yet.
> 
> Hoss  mentioned a couple of ideas:
> 1) sub-classing query parser
> 2) Having the app query a database and somehow passing
> something to Solr or lucene for the filter query
> 
> Can Hoss or someone else point me to more detailed information on what
> might be involved in the two ideas listed above?
> 
> Is somehow keeping an up-to-date map of unique Solr ids to internal
> Lucene ids needed to implement this or is that a separate issue?
> 
> 
> Tom Burton-West
> http://www.hathitrust.org/blogs/large-scale-search
> 
> 
> 



Re: Term is duplicated when updating a document

2010-10-15 Thread Erick Erickson
This is actually known behavior. The problem is that when you update
a document, it's deleted and re-added, but the original is marked as
deleted. However, the terms aren't touched, both the original and the new
document's terms are counted. It'd be hard, very hard, to remove
the terms from the inverted index efficiently.

But when you optimize, all the deleted documents (and their assiociated
terms) are physically removed from the files, thus your term counts change.

HTH
Erick

On Fri, Oct 15, 2010 at 10:05 AM, Thomas Kellerer wrote:

> Thanks for the answer.
>
>
>  Which fields are modified when the document is updated/replaced.
>>
>
> Only one field was changed, but it was not the one where the auto-suggest
> term is coming from.
>
>
>  Are there any differences in the content of the fields that you are using
>> for the AutoSuggest.
>>
> No
>
>
>  Have you changed you schema.xml file recently? If you have, then there may
>> have been changes in the way these fields are analyzed and broken down to
>> terms.
>>
>
> No, I did a complete index rebuild to rule out things like that.
> Then after startup, did a search, then updated the document and did a
> search again.
>
> Regards
> Thomas
>
>
>
>> This may be a bug if you did not change the field or the schema file but
>> the
>> terms count is changing.
>>
>> On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer
>>  wrote:
>>
>>  Hi,
>>>
>>> we are updating our documents (that represent products in our shop) when
>>> a
>>> dealer modifies them, by calling
>>> SolrServer.add(SolrInputDocument) with the updated document.
>>>
>>> My understanding is, that there is no other way of updating an existing
>>> document.
>>>
>>>
>>> However we also use a term query to autocomplete the search field for the
>>> user, but each time adocument is updated (added) the term count is
>>> incremented. So after starting with a new index the count is e.g. 1, then
>>> the document (that contains that term) is updated, and the count is 2,
>>> the
>>> next update will set this to 3 and so on.
>>>
>>> One the index is optimized (by calling SolServer.optimize()) the count is
>>> correct again.
>>>
>>> Am I missing something or is this a bug in Solr/Lucene?
>>>
>>> Thanks in advance
>>> Thomas
>>>
>>>
>>>
>>
>>
>
>


Re: weighted facets

2010-10-15 Thread Peter Karich
Hi,

answering my own question(s).

Result grouping could be the solution as I explained here:
https://issues.apache.org/jira/browse/SOLR-385

> http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf (the file is dated to Aug 
> 2008)

yonik implemented this here:
https://issues.apache.org/jira/browse/SOLR-153

So, really cool: he's the inventor/first-thinker of their 'bitset tree'
! :-)
http://search.lucidimagination.com/search/document/6ccbec5e602687ae/facet_optimizing#6ccbec5e602687ae

Regards,
Peter.

> Hi,
>
> I need a feature which is well explained from Mr Goll at this site **
>
> So, it then would be nice to do sth. like:
>
> facet.stats=sum(fieldX)&facet.stats.sort=fieldX
>
> And the output (sorted against the sum-output) can look sth. like this:
> 
>  
>
>  767
>  892
>
> Is there something similar or was this answered from Hoss at the lucene
> revolution? If not I'll open a JIRA issue ...
>
>
> BTW: is the work from
> http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf contributed back to
> solr?
>
>
> Regards,
> Peter.
>
>
>
> PS: Related issue:
> https://issues.apache.org/jira/browse/SOLR-680
> https://issues.apache.org/jira/secure/attachment/12400054/SOLR-680.patch
>
>
>
> **
> http://lucene.crowdvine.com/posts/14137409
>
> Quoting his question in case the site goes offline:
>
> Hi Chris,
>
> Usually a facet search returns the document count for the
> unique values in the facet field. Is there a way to
> return a weighted facet count based on a user-defined function (sum,
> product, etc.) of another field?
>
> Here is a sum example. Assume we have the following
> 4 documents with 3 fields
>
> ID facet_field weight_field
> 1 solr 0.4
> 2 lucene 0.3
> 3 lucene 0.1
> 4 lucene 0.2
>
> Is there a way to return
>
> solr 0.4
> lucene 0.6
>
> instead of
>
> solr 1
> lucene 3
>
> Given the facet_field contains multiple values
>
> ID facet_field weight_field
> 1 solr lucene 0.2
> 2 lucene 0.3
> 3 solr lucene 0.1
> 4 lucene 0.2
>
> Is there a way to return
>
> solr 0.3
> lucene 0.8
>
> instead of
>
> solr 2
> lucene 4
>
> Thanks,
> Johannes
>
>   


-- 
http://jetwick.com twitter search prototype



Re: Sorting on arbitary 'custom' fields

2010-10-15 Thread Simon Wistow
On Mon, Oct 11, 2010 at 07:17:43PM +0100, me said:
> It was just an idea though and I was hoping that there would be a 
> simpler more orthodox way of doing it.

In the end, for anyone who cares, we used dynamic fields.

There are a lot of them but we haven't seen performance impacted that 
badly so far.






Re: filter query from external list of Solr unique IDs

2010-10-15 Thread Yonik Seeley
On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom  wrote:
> At the Lucene Revolution conference I asked about efficiently building a 
> filter query from an external list of Solr unique ids.

Yeah, I've thought about a special query parser and query to deal with
this (relatively) efficiently, both from a query perspective and a
memory perspective.

Should be pretty quick to throw together:
- comma separated list of terms (unique ids are a special case of this)
- in the query, store as a single byte array for efficiency
- sort the ids if they aren't already sorted
- do lookups with a term enumerator and skip weighting or anything
else like that
- configurable caching... may, or may not want to cache this big query

That's only part of the stuff you mention, but seems like it would be
useful to a number of people.

-Yonik
http://www.lucidimagination.com


RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Jonathan Rochkind
Definitely interested in this. 

The naive obvious approach would be just putting all the ID's in the query. 
Like fq=(id:1 OR id:2 OR).  Or making it another clause in the 'q'.  

Can you outline what's wrong with this approach, to make it more clear what's 
needed in a solution?

From: Burton-West, Tom [tburt...@umich.edu]
Sent: Friday, October 15, 2010 11:49 AM
To: solr-user@lucene.apache.org
Subject: filter query from external list of Solr unique IDs

At the Lucene Revolution conference I asked about efficiently building a filter 
query from an external list of Solr unique ids.

Some use cases I can think of are:
1)  personal sub-collections (in our case a user can create a small subset 
of our 6.5 million doc collection and then run filter queries against it)
2)  tagging documents
3)  access control lists
4)  anything that needs complex relational joins
5)  a sort of alternative to incremental field updating (i.e. update in an 
external database or kv store)
6)  Grant's clustering cluster points and similar apps.

Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be 
any work on it yet.

Hoss  mentioned a couple of ideas:
1) sub-classing query parser
2) Having the app query a database and somehow passing something to 
Solr or lucene for the filter query

Can Hoss or someone else point me to more detailed information on what might be 
involved in the two ideas listed above?

Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids 
needed to implement this or is that a separate issue?


Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search






filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
At the Lucene Revolution conference I asked about efficiently building a filter 
query from an external list of Solr unique ids.

Some use cases I can think of are:
1)  personal sub-collections (in our case a user can create a small subset 
of our 6.5 million doc collection and then run filter queries against it)
2)  tagging documents
3)  access control lists
4)  anything that needs complex relational joins
5)  a sort of alternative to incremental field updating (i.e. update in an 
external database or kv store)
6)  Grant's clustering cluster points and similar apps.

Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be 
any work on it yet.

Hoss  mentioned a couple of ideas:
1) sub-classing query parser
2) Having the app query a database and somehow passing something to 
Solr or lucene for the filter query

Can Hoss or someone else point me to more detailed information on what might be 
involved in the two ideas listed above?

Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids 
needed to implement this or is that a separate issue?


Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search






Re: searching while importing

2010-10-15 Thread Gora Mohanty
On Thu, Oct 14, 2010 at 4:08 AM, Shawn Heisey  wrote:
>  If you are using the DataImportHandler, you will not be able to search new
> data until the full-import or delta-import is complete and the update is
> committed.  When I do a full reindex, it takes about 5 hours, and until it
> is finished, I cannot search it.
>
> I have not tried to issue a manual commit in the middle of an import to see
> whether that makes data inserted up to that point searchable, but I would
> not expect that to work.
[...]

Just as a data point, we have done this, and yes it is possible to do a commit
in the middle of an import, and have the documents that have already been
indexed be available for search.

Regards,
Gora


Re: Term is duplicated when updating a document

2010-10-15 Thread Thomas Kellerer

Thanks for the answer.


Which fields are modified when the document is updated/replaced.


Only one field was changed, but it was not the one where the auto-suggest term 
is coming from.


Are there any differences in the content of the fields that you are using
for the AutoSuggest.

No


Have you changed you schema.xml file recently? If you have, then there may
have been changes in the way these fields are analyzed and broken down to
terms.


No, I did a complete index rebuild to rule out things like that.
Then after startup, did a search, then updated the document and did a search 
again.

Regards
Thomas
 

This may be a bug if you did not change the field or the schema file but the
terms count is changing.

On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer  wrote:


Hi,

we are updating our documents (that represent products in our shop) when a
dealer modifies them, by calling
SolrServer.add(SolrInputDocument) with the updated document.

My understanding is, that there is no other way of updating an existing
document.


However we also use a term query to autocomplete the search field for the
user, but each time adocument is updated (added) the term count is
incremented. So after starting with a new index the count is e.g. 1, then
the document (that contains that term) is updated, and the count is 2, the
next update will set this to 3 and so on.

One the index is optimized (by calling SolServer.optimize()) the count is
correct again.

Am I missing something or is this a bug in Solr/Lucene?

Thanks in advance
Thomas










Possible to sort by explicit docid order?

2010-10-15 Thread Jan Høydahl / Cominvent
Hi,

In an online bookstore project I'm working on, most frontend widgets are search 
driven. Most often they query with some filters and a sort order, such as 
availabledate desc or simply by score.

However, to allow editorial control, some widgets will display a fixed list of 
books, defined as an ordered list of ISBN numbers inserted by the editor. Based 
on this we do a Solr search to fetch the data to display: 
&fq=isbn:(9788200011699 OR 9788200012658 OR ...)

It is important to return the results in the same order as the explicitly given 
list of ISBNs. But I cannot see a way to do that, not even with sort by 
function. So currently we re-order the result list in the frontend.

Would it make sense with an "explicit" sort order, perhaps implemented as a 
function?

&sort=fieldvaluelist(isbn,1000,1,0,$isbnorder) desc, price 
asc&isbnorder=9788200011699,9788200012658,9788200013839,9788200014140

The function would be defined as
  
fieldvaluelist([,...])
The output of the example above would be:
  For document with ISBN=9788200011699: 1000
  For document with ISBN=9788200012658: 999
  For document with ISBN=9788200013839: 998
  For document with ISBN not in the list: 0 (fallback - in which case the 
second sort order would kick in)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



Re: Term is duplicated when updating a document

2010-10-15 Thread Israel Ekpo
Which fields are modified when the document is updated/replaced.

Are there any differences in the content of the fields that you are using
for the AutoSuggest.

Have you changed you schema.xml file recently? If you have, then there may
have been changes in the way these fields are analyzed and broken down to
terms.

This may be a bug if you did not change the field or the schema file but the
terms count is changing.

On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer  wrote:

> Hi,
>
> we are updating our documents (that represent products in our shop) when a
> dealer modifies them, by calling
> SolrServer.add(SolrInputDocument) with the updated document.
>
> My understanding is, that there is no other way of updating an existing
> document.
>
>
> However we also use a term query to autocomplete the search field for the
> user, but each time adocument is updated (added) the term count is
> incremented. So after starting with a new index the count is e.g. 1, then
> the document (that contains that term) is updated, and the count is 2, the
> next update will set this to 3 and so on.
>
> One the index is optimized (by calling SolServer.optimize()) the count is
> correct again.
>
> Am I missing something or is this a bug in Solr/Lucene?
>
> Thanks in advance
> Thomas
>
>


-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Exception being thrown indexing a specific pdf document using Solr Cell

2010-10-15 Thread Shaun Campbell
I've got an existing Spring Solr SolrJ application that indexes a mixture of
documents.  It seems to have been working fine now for a couple of weeks but
today I've just started getting an exception when processing a certain pdf
file.

The exception is :

ERROR: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@4683c2
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
at
uk.co.sjp.intranet.service.SolrServiceImpl.loadDocuments(SolrServiceImpl.java:308)
at
uk.co.sjp.intranet.SearchController.loadDocuments(SearchController.java:297)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710)
at
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167)
at
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414)
at
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402)
at
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771)
at
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716)
at
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647)
at
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630)
at
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436)
at
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374)
at
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302)
at
org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195)
at
org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159)
at
org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141)
at
org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90)
at
org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLo

Term is duplicated when updating a document

2010-10-15 Thread Thomas Kellerer

Hi,

we are updating our documents (that represent products in our shop) when a 
dealer modifies them, by calling
SolrServer.add(SolrInputDocument) with the updated document.

My understanding is, that there is no other way of updating an existing 
document.


However we also use a term query to autocomplete the search field for the user, 
but each time adocument is updated (added) the term count is incremented. So 
after starting with a new index the count is e.g. 1, then the document (that 
contains that term) is updated, and the count is 2, the next update will set 
this to 3 and so on.

One the index is optimized (by calling SolServer.optimize()) the count is 
correct again.

Am I missing something or is this a bug in Solr/Lucene?

Thanks in advance
Thomas



Re: Quick question on indexing an existing index

2010-10-15 Thread Jan Høydahl / Cominvent
Why don't you simply index the source content which you used to build index2 
into index1, i.e. have your "tool" index to both? You won't save anything on 
trying to extract that content from an existing index. But of course, you COULD 
write yourself a tool which extracts all stored fields for all documents in 
index2, transform it into docs which fit in index1 and then insert them. But 
how will you support deletes etc?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 14. okt. 2010, at 17.06, bbarani wrote:

> 
> Hi,
> 
> I have a very simple question about indexing an existing index.
> 
> We have 2 index, index 1 is being maintained by us (it indexes the data from
> a database) and we have an index 2 which is maintaing by a tool..
> 
> Both the schemas are totally different but we are interested to re-index the
> index present in index2 in to index1 such that we will be having just one
> single index (index 1 ) which will contain the data present in both index.
> 
> We want to re-index the index present in index 2 using the schema presnt for
> index 1. Also we are interested in customizing the data (something like
> selecting columns / fields from DB using DB import handler).
> 
> Thanks,
> BB
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Quick-question-on-indexing-an-existing-index-tp1701663p1701663.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLRJ - Searching text in all fields of a Bean

2010-10-15 Thread Subhash Bhushan
Hi Savvas,

Thanks!! Was able to search using  directive.

I was using the default example schema packaged with solr. I added the
following directive for title field and reindexed data:
**

Regards,
Subhash Bhushan.

On Fri, Oct 8, 2010 at 2:09 PM, Savvas-Andreas Moysidis <
savvas.andreas.moysi...@googlemail.com> wrote:

> Hello,
>
> What does your schema look like? Have you defined a  "catch all" field and
> copy every value from all your other fields in it with a 
> directive?
>
> Cheers,
> -- Savvas
>
>
> On 8 October 2010 08:30, Subhash Bhushan wrote:
>
>> Hi,
>>
>> I have two fields in the bean class, id and title.
>> After adding the bean to SOLR, I want to search for, say "kitten", in all
>> defined fields in the bean, like this -- query.setQuery( "kitten"); --
>> But I get results only when I affix the bean field name before the search
>> text like this -- query.setQuery( "title:kitten"); --
>>
>> Same case even when I use SolrInputDocument, and add these fields.
>>
>> Can we search text in all fields of a bean, without having to specify a
>> field?
>> If we can, what am I missing in my code?
>>
>> *Code:*
>> Bean:
>> ---
>> public class SOLRTitle {
>> @Field
>> public String id = "";
>>  @Field
>> public String title = "";
>> }
>> ---
>> Indexing function:
>> ---
>>
>> private static void uploadData() {
>>
>> try {
>> ... // Get Titles
>>List solrTitles = new
>> ArrayList();
>> Iterator it = titles.iterator();
>> while(it.hasNext()) {
>> Title title = (Title) it.next();
>> SOLRTitle solrTitle = new SOLRTitle();
>> solrTitle.id = title.getID().toString();
>> solrTitle.title = title.getTitle();
>> solrTitles.add(solrTitle);
>> }
>> server.addBeans(solrTitles);
>> server.commit();
>> } catch (SolrServerException e) {
>> e.printStackTrace();
>> } catch (IOException e) {
>> e.printStackTrace();
>> }
>> }
>> ---
>> Querying function:
>> ---
>>
>> private static void queryData() {
>>
>> try {
>> SolrQuery query = new SolrQuery();
>> query.setQuery( "kitten");
>>
>>QueryResponse rsp = server.query( query );
>>List beans = rsp.getBeans(SOLRTitle.class);
>>System.out.println(beans.size());
>>Iterator it = beans.iterator();
>>while(it.hasNext()) {
>> SOLRTitle solrTitle = (SOLRTitle)it.next();
>> System.out.println(solrTitle.id);
>> System.out.println(solrTitle.title);
>>}
>> } catch (SolrServerException e) {
>> e.printStackTrace();
>> }
>> }
>> --
>>
>> Subhash Bhushan.
>>
>
>


Re: problem on running fullimport

2010-10-15 Thread Ken Stanley
On Fri, Oct 15, 2010 at 7:42 AM, swapnil dubey wrote:

> Hi,
>
> I am using the full import option with the data-config file as mentioned
> below
>
> 
>url="jdbc:mysql:///xxx" user="xxx" password="xx"  />
>
>
>
>
>
> 
>
>
> on running the full-import option I am getting the error mentioned below.I
> had already included the dataimport.properties file in my conf file.help me
> to get the issue resolved
>
> 
> -
> 
> 0
> 334
> 
> -
> 
> -
> 
> data-config.xml
> 
> 
> full-import
> debug
> 
> -
> 
> -
> 
> -
> 
> select studentName from test1
> -
> 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: select studentName from test1 Processing Document # 1
> ...
>
> --
> Regards
> Swapnil Dubey
>

Swapnil,

Everything looks fine, except that in your entity definition you forgot to
define which datasource you wish to use. So if you add the
'dataSource="JdbcDataSource"' that should get rid of your exception. As a
reminder, the DataImportHandler wiki (
http://wiki.apache.org/solr/DataImportHandler) on Apache's website is very
helpful with learning how to use the DIH properly. It has helped me with
having a printed copy beside me for easy and quick reference.

- Ken


problem on running fullimport

2010-10-15 Thread swapnil dubey
Hi,

I am using the full import option with the data-config file as mentioned
below


   








on running the full-import option I am getting the error mentioned below.I
had already included the dataimport.properties file in my conf file.help me
to get the issue resolved


-

0
334

-

-

data-config.xml


full-import
debug

-

-

-

select studentName from test1
-

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: select studentName from test1 Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:184)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.sql.SQLException: Illegal value for setFetchSize().
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
at com.mysql.jdbc.StatementImpl.setFetchSize(StatementImpl.java:2496)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:242)
... 33 more

0:0:0.50



idle
Configuration Re-loaded sucessfully
-

0:0:0.299
1
0
0
0
2010-10-15 16:42:21
Indexing failed. Rolled back all changes.
2010-10-15 16:42:21

-

This response format is experimental.  It is likely to change in the future.



-- 
Regards
Swapnil Dubey


Re: SOLRJ - Searching text in all fields of a Bean

2010-10-15 Thread Subhash Bhushan
Ahmet,

I got it working to an extent.

Now:
SolrQuery query = new SolrQuery();
query.setQueryType("dismax");
query.setQuery( "kitten");
query.setParam("qf", "title");


QueryResponse rsp = server.query( query );
List beans = rsp.getBeans(SOLRTitle.class);
System.out.println(beans.size());
Iterator it = beans.iterator();
while(it.hasNext()) {
SOLRTitle solrTitle = (SOLRTitle)it.next();
System.out.println(solrTitle.id);
System.out.println(solrTitle.title);
}

*This code is able to find the record, and prints the ID. But fails to print
the Title.*

Whereas:
SolrQuery query = new SolrQuery();
query.setQuery( "title:kitten" );

QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();

Iterator iter = rsp.getResults().iterator();

while (iter.hasNext()) {
  SolrDocument resultDoc = iter.next();

  String title = (String) resultDoc.getFieldValue("
title");
  String id = (String) resultDoc.getFieldValue("id"); //id is
the uniqueKey field
  System.out.println(id);
  System.out.println(title);
}
*
This query succeeds!*

What am I doing wrong in dismax params? The title field is being fetched as
Null.

Regards,
Subhash Bhushan.


On Fri, Oct 8, 2010 at 2:05 PM, Ahmet Arslan  wrote:

> > I have two fields in the bean class, id and title.
> > After adding the bean to SOLR, I want to search for, say
> > "kitten", in all
> > defined fields in the bean, like this -- query.setQuery(
> > "kitten"); --
> > But I get results only when I affix the bean field name
> > before the search
> > text like this -- query.setQuery( "title:kitten"); --
> >
> > Same case even when I use SolrInputDocument, and add these
> > fields.
> >
> > Can we search text in all fields of a bean, without having
> > to specify a
> > field?
>
> With dismax, you can query several fields using different boosts.
> http://wiki.apache.org/solr/DisMaxQParserPlugin
>
>
>
>
>


How do you programatically create new cores?

2010-10-15 Thread Tharindu Mathew
Hi everyone,

I'm a newbie at this and I can't figure out how to do this after going
through http://wiki.apache.org/solr/CoreAdmin?

Any sample code would help a lot.

Thanks in advance.

-- 
Regards,

Tharindu


Re: JVM GC troubles

2010-10-15 Thread accid
Hi,

I dont run totally OOM (no OOM exceptions in the log) but I constantly
garbage collect. While not collecting, SOLR master handels the updates
pretty well.

Every insert is unique, so I dont have any deletes or optimizes and all
queries are handled by the single slave instance. Is there a way to reduce
the objects held in the old gen space ? It looks like the JVM is trying to
hold as many objects as possible in the cache, to provide fast queries, who
are not needed in my situation.

Regarding the Jboss ... well as I said, its the minimalistic version of it
and we use it due to automation process within our departement. In my
test-env I tried it with a plain tomcat 6.x but without any improvements, so
the Jboss overhead is minimal to nothing.

The JVM parameters I wrote, are the ones I am struggling with at the moment.
I was hoping someone will come up with a hint regarding the solarconfig.xml
itself.

PS: if anyone is questioning the implemented architecture (master -> slave,
configs, schema, etc.)  ... its our architects fault and I have to operate
it ;-)

2010/10/15 Otis Gospodnetic 

> Hello,
>
> I hope you are not running JBoss just to run Solr - there are simpler
> containers
> out there, e.g., Jetty.
> Do you OOM?
> Do things look better if you replicate less often (e.g. every 5 minutes
> instead
> of every 60 seconds)?
> Do all/some of those -X__ JVM params actually help?
>
> Otis
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: accid 
> > To: solr-user@lucene.apache.org
> > Sent: Thu, October 14, 2010 1:25:34 PM
> > Subject: Re: JVM GC troubles
> >
> > I forgot a few important details:
> >
> > solr version = 1.4.1
> > current index  size = 50gb
> > growth ~600mb / day
> > jboss runs with web settings (same as  minimal)
> > 2010/10/14 
> >
> > >  Hi,
> > >
> > > as I am new here, I want to say hello and thanks in advance  for your
> help.
> > >
> > >
> > > HW Setup:
> > >
> > > 1x SOLR Master  - Sun Microsystems SUN FIRE X4450 - 4 x 2,93ghz, 64gb
> ram
> > > 1x SOLR Slave  -  Sun Microsystems SUN FIRE X4450 - 4 x 2,93ghz, 64gb
> ram
> > >
> > >  SW Setup:
> > >
> > > Solaris 10 Generic_142901-03
> > > jboss  5.1.0
> > > JDK 1.6 update 18
> > >
> > >
> > > # Specify the exact Java  VM executable to use.
> > > #
> > >  JAVA="/opt/appsrv/java6/bin/amd64/java"
> > >
> > > #
> > > # Specify  options to pass to the Java VM.
> > > #
> > > JAVA_OPTS="-server -Xms6144m  -Xmx6144m -Xmn3072m
> -XX:ThreadStackSize=1024
> > > -XX:MaxPermSize=512m  -Dorg.jboss.resolver.warning=true
> > >  -Dsun.rmi.dgc.client.gcInterval=360
> > >  -Dsun.rmi.dgc.server.gcInterval=360
> -Dnetworkaddress.cache.ttl=1800
> > >  -XX:+UseConcMarkSweepGC"
> > >
> > >
> > > SOLR Setup:
> > >
> > > #)  the master has to deal an avg. update rate of 50 updates/s and
> peaks of
> > >  400 updates/s
> > >
> > > #) the slave replicates every 60s using the built  in solr replication
> > > method (NOT rsync)
> > >
> > > #) the slave  querys are ~20/sec
> > >
> > >
> > > #) schema.xml
> > >
> > >
> > >   > >  required="true"/>
> > >  > > required="true"/>
> > >  > > required="true"/>
> > >   > >  required="true"/>
> > >  > > required="true"/>
> > >  > > required="true"/>
> > >  
> > > 
> > > 
> > > 
> > > 
> > > 
> > >  
> > >  > > multiValued="true"/>
> > >  > >  multiValued="true"/>
> > >  > > multiValued="true"/>
> > >  > >  multiValued="true"/>
> > >  > > multiValued="true"/>
> > >  > >  multiValued="true"/>
> > > 
> > >  > > required="true"/>
> > >  > > default="NOW"  multiValued="false"/>
> > >
> > >
> > > #) The solarconfig.xml is  attached
> > >
> > >
> > >
> > > Both, master & slave suffer from  serious performance impacts during
> garbage
> > >  collects
> > >
> > >
> > > I obviously have an GC problem, because ~30min  after startup, the Old
> space
> > > is full and not beeing freed  up.
> > >
> > > Below you find a JMX copy&paste of the Heap AFTER a  garbage collect!!
> As
> > > you can see, even the Eden Space can only free up  to 700mb total,
> which
> > > gives very little time to relax. The system does  GC's 90% of the time.
> > >
> > >
> > >
> > >
> > > Total Memory  Pools: 5
> > >
> > >Pool: Code Cache (Non-heap  memory)
> > >
> > >Peak Usage : init:4194304,  used:7679360, committed:7798784,
> > > max:50331648
> > > Current Usage : init:4194304, used:7677312,  committed:7798784,
> > > max:50331648
> > >
> > >
> > > |-| committed:7.44Mb
> > >
> > >
> +-+
> > > |/| | max:48Mb
> > >
> > >
> +-+
> > > |-|  used:7.32Mb
> > >
> > >
> > >Pool: Par Eden Space (Heap  memory)
> > >
> > >Peak Usage : init:2577006592,  used:2577006592,
> committed:2577006592,
> > > max:2577006592
> > >   

SolrJ API for multi core?

2010-10-15 Thread Tharindu Mathew
Hi,

Is $subject available??

Or do I need to make HTTP Get calls?


-- 
Regards,

Tharindu