Re: Mr Lance : customize the search algorithm of solr

2010-06-22 Thread Lance Norskog
Solr depends on Lucene's implementation of queries and how it returns
document hits. I can't help you architect these changes.

On Mon, Jun 21, 2010 at 7:47 AM, sarfaraz masood
 wrote:
> Mr Lance
>
> Thanks
> a lot for ur reply.. I am a novice a solr / lucene. but i have gone
> thru the documentations of both.I have even implemented programs in
> lucene for searching etc.
>
> My problem is to apply a new search technique other than the one used by solr.
>
> Step 1: My algorithm finds the tf idf values of all  the terms in each url 
> and makes a chart like this : -
>
>  term 1 term2  term3 ...
> url 1    0.7 0.6  0.7
> url 2    0.0 0.5  0.4
> url 3
>  0.7 0.8  0.6
> ..
> .
> .
> .
> .
> (urls with 0 tf idf means word doesnt exist there.)
>
> This ways i first construct a complete chart  of term  tf idf to urls..
>
> Step 2 (Searcher )
> then
> depending on words in the query i select the correct urls by applying
> mathematical formulae. This result should be shown to the user in
> descending order.
>
> Now as i know that lucene has its own searcher
> which is used by solr as well. cant i replace this searcher part in
> SOLR by a java program that returns urls by my algorithm. Rest every
> thing should be of solr.
>
> Only change the searcher part. I have
> studied abt customizing the scoring which is absolutely not my aim.My
> aim seems to be replacing the searcher. It is a work similar to BM25
> work which u had mentioned in your reply  viz providing an alternate to
> lucene search.
>
> Plz help me in this regards. I will be highly gratefull to you for your 
> assistance in this work of mine.
>
> If any part of this mail was not clear to you then plz lemme know, i will 
> expain that you.
>
> Regards
>
> -sarfaraz
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: OOM on sorting on dynamic fields

2010-06-22 Thread Lance Norskog
No, this is basic to how Lucene works. You will need larger EC2 instances.

On Mon, Jun 21, 2010 at 2:08 AM, Matteo Fiandesio
 wrote:
> Compiling solr with lucene 2.9.3 instead of 2.9.1 will solve this issue?
> Regards,
> Matteo
>
> On 19 June 2010 02:28, Lance Norskog  wrote:
>> The Lucene implementation of sorting creates an array of four-byte
>> ints for every document in the index, and another array of the unique
>> values in the field.
>> If the timestamps are 'date' or 'tdate' in the schema, they do not
>> need the second array.
>>
>> You can also sort by a field's with a function query. This does not
>> build the arrays, but might be a little slower.
>> Yes, the sort arrays (and also facet values for a field) should be
>> controlled by a fixed-size cache, but they are not.
>>
>> On Fri, Jun 18, 2010 at 7:52 AM, Matteo Fiandesio
>>  wrote:
>>> Hello,
>>> we are experiencing OOM exceptions in our single core solr instance
>>> (on a (huge) amazon EC2 machine).
>>> We investigated a lot in the mailing list and through jmap/jhat dump
>>> analyzing and the problem resides in the lucene FieldCache that fills
>>> the heap and blows up the server.
>>>
>>> Our index is quite small but we have a lot of sort queries  on fields
>>> that are dynamic,of type long representing timestamps and are not
>>> present in all the documents.
>>> Those queries apply sorting on 12-15 of those fields.
>>>
>>> We are using solr 1.4 in production and the dump shows a lot of
>>> Integer/Character and Byte Array filled up with 0s.
>>> With solr's trunk code things does not change.
>>>
>>> In the mailing list we saw a lot of messages related to this issues:
>>> we tried truncating the dates to day precision,using missingSortLast =
>>> true,changing the field type from slong to long,setting autowarming to
>>> different values,disabling and enabling caches with different values
>>> but we did not manage to solve the problem.
>>>
>>> We were thinking to implement an LRUFieldCache field type to manage
>>> the FieldCache as an LRU and preventing but, before starting a new
>>> development, we want to be sure that we are not doing anything wrong
>>> in the solr configuration or in the index generation.
>>>
>>> Any help would be appreciated.
>>> Regards,
>>> Matteo
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Alternative for field collapsing

2010-06-22 Thread Rakhi Khatwani
Hi,
I wanted to apply field collapsing on the title(type string). but
want to show only one document (and the count of such documents) per title
rather than show all the documents.

Regards
Raakhi


On Tue, Jun 22, 2010 at 12:59 AM, Peter Karich  wrote:

> Hi Raakhi,
>
> First, field collapsing works pretty well in our system. And, as Martin
> has said on 17.06.2010 in the other thread "Field Collapsing SOLR-236":
>
> I've added a new patch to the issue, so building the trunk (rev
> 955615) with the latest patch should not be a problem. Due to recent
> changes in the Lucene trunk the patch was not compatible.
>
> Second, if the id is unique applying field collapse make no sense. So I
> suppose you will apply field collapsing to the title, right?
> But in this case, why doesn't a simple query ala q=title:'my
> title'&sort=price asc work for you? Or what do you want to achieve?
> (The title should be of type string, I think)
>
> Regards,
> Peter.
>
> > Hi,
> >   I have an index with the following fields:
> >   id  (unique)
> >   title
> >   description
> >   price.
> >
> > Suppose i want to find unique documents and count of all documents with
> the
> > same title, sorted on price.
> > How do i go about it.
> > Knowing that field collapsing is not stable with 1.4.
> > if i go about using facet's on id, it sorts either on id or on the count,
> > but not on the price,
> >
> > Any Suggestions??
> > Regards,
> > Raakhi
> >
> >
>
>
>


RE: solr string field

2010-06-22 Thread ZAROGKIKAS,GIORGOS
It's ok 
It was a problem with my schema

Thanks anyway

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Monday, June 21, 2010 5:09 PM
To: solr-user@lucene.apache.org
Subject: Re: solr string field

Or even better for an exact string query:

q={!raw f=field_name}sony vaio

(that's NOT URL encoded, but needs to be when sending the request over  
HTTP)

Erik


On Jun 21, 2010, at 9:43 AM, Jan Høydahl / Cominvent wrote:

> Hi,
>
> You either need to quote your string: http://localhost:8983/solr/select?q= 
> "sony+vaio"
> or to escape the space: http://localhost:8983/solr/select?q=sony\+vaio
>
> If you do not do one of these, your query will be parsed as  
> text:sony OR text:vaio, which will not match your string field.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
>
> On 21. juni 2010, at 14.42, ZAROGKIKAS,GIORGOS wrote:
>
>> Hi
>>  I use a string Field in my solr schema
>>  but when I query a value with space it doesn't give me results
>>  
>>  e.g I have a value "sony vaio" when I query with "sony vaio" I
>> get 0 results
>>  but when I query "sony*" I get my results
>>
>>  how can I query a string field with a space between the values
>>  or how can I have exact search in a string
>>
>>
>> Thanks in advance
>>
>



Re: performance sorting multivalued field

2010-06-22 Thread Marc Sturlese

>>Well, sorting requires that all the unique values in the target field
>>get loaded into memory
That's what I tought, thanks.

>>But a larger question is whether what your doing is worthwhile
>>even as just a measurement. You say
>>"This is good for me, I don't care for my tests". I claim that
>>you do care
I just like play with things. First checked the behavior of sorting on
multiValued field and what I noticed was, let's say you have docs with field
called 'num':
doc1->num:2;doc2->num:1,num:4;doc3->num:5
Sorting by the field num what you get is:
After sorting asc I get: doc2,doc1,doc3.
The behavior seems to be always the same (I am not saying it works like that
but it's what I've seen in my examples)
After seeing that I just decided to check the performance. The point is
simply curiosity.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p913626.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LocalParams?

2010-06-22 Thread Peter Karich
E.g. take a look at:
http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html

Peter.

> Huh? Read through the wiki: See http://wiki.apache.org/solr/LocalParams but I
> still don't understand its utility? 
>
> Can someone explain to me why this would even be used? Any examples to help
> clarify? Thanks!
>   




Re: OOM on sorting on dynamic fields

2010-06-22 Thread Matteo Fiandesio
First of all thanks for your answers.
Those OOMEs are pretty nasty for our production environment.
I didn't try the solution of ordering by function as it was a solr 1.5
feature and we prefer to use a stable version 1.4.

I made a temporary patch that it looks is working fine.
I patched the lucene-core-2.9.1 source code adding those this lines in the


abstract static class Cache's get method
...
public Object get(IndexReader reader, Entry key) throws IOException {
  Map innerCache;
  Object value;
+  final Object readerKey = reader.getFieldCacheKey();
+ CacheEntry[] cacheEntries = wrapper.getCacheEntries();
+if(cacheEntries.length>A_TUNED_INT_VALUE){
+ readerCache.clear();
+ }
...

I didn't notice any delay or concurrence problem.




On 22 June 2010 07:27, Lance Norskog  wrote:
> No, this is basic to how Lucene works. You will need larger EC2 instances.
>
> On Mon, Jun 21, 2010 at 2:08 AM, Matteo Fiandesio
>  wrote:
>> Compiling solr with lucene 2.9.3 instead of 2.9.1 will solve this issue?
>> Regards,
>> Matteo
>>
>> On 19 June 2010 02:28, Lance Norskog  wrote:
>>> The Lucene implementation of sorting creates an array of four-byte
>>> ints for every document in the index, and another array of the unique
>>> values in the field.
>>> If the timestamps are 'date' or 'tdate' in the schema, they do not
>>> need the second array.
>>>
>>> You can also sort by a field's with a function query. This does not
>>> build the arrays, but might be a little slower.
>>> Yes, the sort arrays (and also facet values for a field) should be
>>> controlled by a fixed-size cache, but they are not.
>>>
>>> On Fri, Jun 18, 2010 at 7:52 AM, Matteo Fiandesio
>>>  wrote:
 Hello,
 we are experiencing OOM exceptions in our single core solr instance
 (on a (huge) amazon EC2 machine).
 We investigated a lot in the mailing list and through jmap/jhat dump
 analyzing and the problem resides in the lucene FieldCache that fills
 the heap and blows up the server.

 Our index is quite small but we have a lot of sort queries  on fields
 that are dynamic,of type long representing timestamps and are not
 present in all the documents.
 Those queries apply sorting on 12-15 of those fields.

 We are using solr 1.4 in production and the dump shows a lot of
 Integer/Character and Byte Array filled up with 0s.
 With solr's trunk code things does not change.

 In the mailing list we saw a lot of messages related to this issues:
 we tried truncating the dates to day precision,using missingSortLast =
 true,changing the field type from slong to long,setting autowarming to
 different values,disabling and enabling caches with different values
 but we did not manage to solve the problem.

 We were thinking to implement an LRUFieldCache field type to manage
 the FieldCache as an LRU and preventing but, before starting a new
 development, we want to be sure that we are not doing anything wrong
 in the solr configuration or in the index generation.

 Any help would be appreciated.
 Regards,
 Matteo

>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Alternative for field collapsing

2010-06-22 Thread Peter Karich
Hi Raakhi,

yes, then the collapse patch works perfectly in our case. If you don't
get the patch applied correctly, try asking directly here:
https://issues.apache.org/jira/browse/SOLR-236

I did the same and got immediately response from Martin & Co or try the
latest patch:
2010-06-17 03:08 PM Martijn van Groningen

Querying is simple:
q=peter&collapse.field=title

and you will get back only one document for the same title containing
'peter' and
additionally the 'similar'/collapse-count for every document:

  title
  

4512
4010
...

Regards,
Peter.

> Hi,
> I wanted to apply field collapsing on the title(type string). but
> want to show only one document (and the count of such documents) per title
> rather than show all the documents.
>
> Regards
> Raakhi
>
>
> On Tue, Jun 22, 2010 at 12:59 AM, Peter Karich  wrote:
>
>   
>> Hi Raakhi,
>>
>> First, field collapsing works pretty well in our system. And, as Martin
>> has said on 17.06.2010 in the other thread "Field Collapsing SOLR-236":
>>
>> I've added a new patch to the issue, so building the trunk (rev
>> 955615) with the latest patch should not be a problem. Due to recent
>> changes in the Lucene trunk the patch was not compatible.
>>
>> Second, if the id is unique applying field collapse make no sense. So I
>> suppose you will apply field collapsing to the title, right?
>> But in this case, why doesn't a simple query ala q=title:'my
>> title'&sort=price asc work for you? Or what do you want to achieve?
>> (The title should be of type string, I think)
>>
>> Regards,
>> Peter.
>>
>> 
>>> Hi,
>>>   I have an index with the following fields:
>>>   id  (unique)
>>>   title
>>>   description
>>>   price.
>>>
>>> Suppose i want to find unique documents and count of all documents with
>>>   
>> the
>> 
>>> same title, sorted on price.
>>> How do i go about it.
>>> Knowing that field collapsing is not stable with 1.4.
>>> if i go about using facet's on id, it sorts either on id or on the count,
>>> but not on the price,
>>>
>>> Any Suggestions??
>>> Regards,
>>> Raakhi
>>>
>>>
>>>   
>>
>>
>> 
>   


-- 
http://karussell.wordpress.com/



Re: Alternative for field collapsing

2010-06-22 Thread Peter Karich
ups, sorry. I meant Martijn! Not the germanized Martin :-/

Peter.

> Hi,
> I wanted to apply field collapsing on the title(type string). but
> want to show only one document (and the count of such documents) per title
> rather than show all the documents.
>
> Regards
> Raakhi
>
>
> On Tue, Jun 22, 2010 at 12:59 AM, Peter Karich  wrote:
>
>   
>> Hi Raakhi,
>>
>> First, field collapsing works pretty well in our system. And, as Martin
>> has said on 17.06.2010 in the other thread "Field Collapsing SOLR-236":
>>
>> I've added a new patch to the issue, so building the trunk (rev
>> 955615) with the latest patch should not be a problem. Due to recent
>> changes in the Lucene trunk the patch was not compatible.
>>
>> Second, if the id is unique applying field collapse make no sense. So I
>> suppose you will apply field collapsing to the title, right?
>> But in this case, why doesn't a simple query ala q=title:'my
>> title'&sort=price asc work for you? Or what do you want to achieve?
>> (The title should be of type string, I think)
>>
>> Regards,
>> Peter.
>>
>> 
>>> Hi,
>>>   I have an index with the following fields:
>>>   id  (unique)
>>>   title
>>>   description
>>>   price.
>>>
>>> Suppose i want to find unique documents and count of all documents with
>>>   
>> the
>> 
>>> same title, sorted on price.
>>> How do i go about it.
>>> Knowing that field collapsing is not stable with 1.4.
>>> if i go about using facet's on id, it sorts either on id or on the count,
>>> but not on the price,
>>>
>>> Any Suggestions??
>>> Regards,
>>> Raakhi
>>>
>>>   



Re: Alternative for field collapsing

2010-06-22 Thread Rakhi Khatwani
Thanks Peter :)

On Tue, Jun 22, 2010 at 3:08 PM, Peter Karich  wrote:

> ups, sorry. I meant Martijn! Not the germanized Martin :-/
>
> Peter.
>
> > Hi,
> > I wanted to apply field collapsing on the title(type string). but
> > want to show only one document (and the count of such documents) per
> title
> > rather than show all the documents.
> >
> > Regards
> > Raakhi
> >
> >
> > On Tue, Jun 22, 2010 at 12:59 AM, Peter Karich  wrote:
> >
> >
> >> Hi Raakhi,
> >>
> >> First, field collapsing works pretty well in our system. And, as Martin
> >> has said on 17.06.2010 in the other thread "Field Collapsing SOLR-236":
> >>
> >> I've added a new patch to the issue, so building the trunk (rev
> >> 955615) with the latest patch should not be a problem. Due to recent
> >> changes in the Lucene trunk the patch was not compatible.
> >>
> >> Second, if the id is unique applying field collapse make no sense. So I
> >> suppose you will apply field collapsing to the title, right?
> >> But in this case, why doesn't a simple query ala q=title:'my
> >> title'&sort=price asc work for you? Or what do you want to achieve?
> >> (The title should be of type string, I think)
> >>
> >> Regards,
> >> Peter.
> >>
> >>
> >>> Hi,
> >>>   I have an index with the following fields:
> >>>   id  (unique)
> >>>   title
> >>>   description
> >>>   price.
> >>>
> >>> Suppose i want to find unique documents and count of all documents with
> >>>
> >> the
> >>
> >>> same title, sorted on price.
> >>> How do i go about it.
> >>> Knowing that field collapsing is not stable with 1.4.
> >>> if i go about using facet's on id, it sorts either on id or on the
> count,
> >>> but not on the price,
> >>>
> >>> Any Suggestions??
> >>> Regards,
> >>> Raakhi
> >>>
> >>>
>
>


[NEWS] New Response Writer for Native PHP Solr Client

2010-06-22 Thread Israel Ekpo
Hi Solr users,

If you are using Apache Solr via PHP, I have some good news for you.

There is a new response writer for the PHP native extension, currently
available as a plugin.

This new feature adds a new response writer class to the
org.apache.solr.request package.

This class is used by the PHP Native Solr Client driver to prepare the query
response from Solr.

This response writer allows you to configure the way the data is serialized
for the PHP client.

You can use your own class name and you can also control how the properties
are serialized as well.

The formatting of the response data is very similar to the way it is
currently done by the PECL extension on the client side.

The only difference now is that this serialization is happening on the
server side instead.

You will find this new response writer particularly useful when dealing with
responses for

- highlighting
- admin threads responses
- more like this responses

to mention just a few

You can pass the "objectClassName" request parameter to specify the class
name to be used for serializing objects.

Please note that the class must be available on the client side to avoid a
PHP_Incomplete_Object error during the unserialization process.

You can also pass in the "objectPropertiesStorageMode" request parameter
with either a 0 (independent properties) or a 1 (combined properties).

These parameters can also be passed as a named list when loading the
response writer in the solrconfig.xml file

Having this control allows you to create custom objects which gives the
flexibility of implementing custom __get methods, ArrayAccess, Traversable
and Iterator interfaces on the PHP client side.

Until this class in incorporated into Solr, you simply have to copy the jar
file containing this plugin into your lib directory under $SOLR_HOME

The jar file is available here

https://issues.apache.org/jira/browse/SOLR-1967

Then set up the configuration as shown below and then restart your servlet
container

Below is an example configuration in solrconfig.xml




SolrObject

0


Below is an example implementation on the PHP client side.

Support for specifying custom response writers will be available starting
from the 0.9.11 version (released today) of the PECL extension for Solr
currently available here

http://pecl.php.net/package/solr

Here is an example of how to use the new response writer with the PHP
client.


$property_name;
} else if (isset($_properties[$property_name])) { return
$_properties[$property_name]; }

return null;
}
}

$options = array
(
'hostname' => 'localhost',
'port' => 8983,
'path' => '/solr/'
);

$client = new SolrClient($options);

$client->setResponseWriter("phpnative");

$response = $client->ping();

$query = new SolrQuery();

$query->setQuery(":");

$query->set("objectClassName", "SolrClass");
$query->set("objectPropertiesStorageMode", 1);

$response = $client->query($query);

$resp = $response->getResponse();

?>


Documentation of the changes to the PECL extension are available here

http://docs.php.net/manual/en/solrclient.construct.php
http://docs.php.net/manual/en/solrclient.setresponsewriter.php

Please contact me at ie...@php.net, if you have any questions or comments.

-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


How to wait for StreamingUpdateSolrServer to finish?

2010-06-22 Thread Stephen Duncan Jr
I'm prototyping using StreamingUpdateSolrServer.  I want to send a commit
(or optimize) after I'm done adding all of my docs, rather than wait for the
autoCommit to kick in.  However, since StreamingUpdateSolrServer is
multi-threaded, I can't simply call commit when I'm done, because that can
happen before the StreamingUpdateSolrServer actually sends all the docs.  I
would think that calling the method blockUntilFinished() before issuing the
commit would do the trick, but I still get my commit sent before the last
document is sent.  I've tried this with both Solr 1.4.0 and the latest
release candidate for Solr 1.4.1.  Has anybody else had this experience?
 Should I file a bug on blockUntilFinished()?

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Searching across multiple repeating fields

2010-06-22 Thread Mark Allan

Hi all,

Firstly, I apologise for the length of this email but I need to  
describe properly what I'm doing before I get to the problem!


I'm working on a project just now which requires the ability to store  
and search on temporal coverage data - ie. a field which specifies a  
date range during which a certain event took place.


I hunted around for a few days and couldn't find anything which seemed  
to fit, so I had a go at writing my own field type based on  
solr.PointType.  It's used as follows:

  schema.xml
	dimension="2" subFieldSuffix="_i"/>
	multiValued="true"/>

  data.xml


...
1940,1945



Internally, this gets stored as:
1940,1945
1940
1945

In due course, I'll declare the subfields as a proper date type, but  
in the meantime, this works absolutely fine.  I can search for an  
individual date and Solr will check (queryDate > daterange_0 AND  
queryDate < daterange_1 ) and the correct documents are returned.  My  
code also allows the user to input a date range in the query but I  
won't complicate matters with that just now!


The problem arises when a document has more than one "daterange" field  
(imagine a news broadcast which covers a variety of topics and hence  
time periods).


A document with two daterange fields

...
19820402,19820614
1990,2000

gets stored internally as
19820402,198206141990,2000str>
198204021990arr>
198206142000arr>


In this situation, searching for 1985 should yield zero results as it  
is contained within neither daterange, however, the above document is  
returned in the result set.  What Solr is doing is checking that the  
queryDate (1985) is greater than *any* of the values in daterange_0  
AND queryDate is less than *any* of the values in daterange_1.


How can I get Solr to respect the positions of each item in the  
daterange_0 and _1 arrays?  Ideally I'd like the search to use the  
following logic, thus preventing the above document from being  
returned in a search for 1985:
	(queryDate > daterange_0[0] AND queryDate < daterange_1[0]) OR  
(queryDate > daterange_0[1] AND queryDate < daterange_1[1])


Someone else had a very similar problem recently on the mailing list  
with a multiValued PointType field but the thread went cold without a  
final solution.


While I could filter the results when they get back to my application  
layer, it seems like it's not really the right place to do it.


Any help getting Solr to respect the positions of items in arrays  
would be very gratefully received.


Many thanks,
Mark


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Field Collapsing SOLR-236

2010-06-22 Thread Rakhi Khatwani
Hi,
  I tried checking out the latest code (rev 956715) the patch did not
work on it.
Infact i even tried hunting for the revision mentioned earlier in this
thread (i.e. rev 955615) but cannot find it in the repository. (it has
revision 955569 followed by revision 955785).

Any pointers??
Regards
Raakhi

On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen <
martijn.is.h...@gmail.com> wrote:

> Oh in that case is the code stable enough to use it for production?
> -  Well this feature is a patch and I think that says it all.
> Although bugs are fixed it is deferentially an experimental feature
> and people should keep that in mind when using one of the patches.
> Does it support features which solr 1.4 normally supports?
>- As far as I know yes.
>
> am using facets as a workaround but then i am not able to sort on any
> other field. is there any workaround to support this feature??
>- Maybee http://wiki.apache.org/solr/Deduplication prevents from
> adding duplicates in you index, but then you miss the collapse counts
> and other computed values
>
> On 21 June 2010 09:04, Rakhi Khatwani  wrote:
> > Hi,
> >Oh in that case is the code stable enough to use it for production?
> > Does it support features which solr 1.4 normally supports?
> >
> > I am using facets as a workaround but then i am not able to sort on any
> > other field. is there any workaround to support this feature??
> >
> > Regards,
> > Raakhi
> >
> > On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen <
> > martijn.is.h...@gmail.com> wrote:
> >
> >> Hi Rakhi,
> >>
> >> The patch is not compatible with 1.4. If you want to work with the
> >> trunk. I'll need to get the src from
> >> https://svn.apache.org/repos/asf/lucene/dev/trunk/
> >>
> >> Martijn
> >>
> >> On 18 June 2010 13:46, Rakhi Khatwani  wrote:
> >> > Hi Moazzam,
> >> >
> >> >  Where did u get the src code from??
> >> >
> >> > I am downloading it from
> >> > https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
> >> >
> >> > and the latest revision in this location is 955469.
> >> >
> >> > so applying the latest patch(dated 17th june 2010) on it still
> generates
> >> > errors.
> >> >
> >> > Any Pointers?
> >> >
> >> > Regards,
> >> > Raakhi
> >> >
> >> >
> >> > On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan 
> >> wrote:
> >> >
> >> >> I knew it wasn't me! :)
> >> >>
> >> >> I found the patch just before I read this and applied it to the trunk
> >> >> and it works!
> >> >>
> >> >> Thanks Mark and martijn for all your help!
> >> >>
> >> >> - Moazzam
> >> >>
> >> >> On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen
> >> >>  wrote:
> >> >> > I've added a new patch to the issue, so building the trunk (rev
> >> >> > 955615) with the latest patch should not be a problem. Due to
> recent
> >> >> > changes in the Lucene trunk the patch was not compatible.
> >> >> >
> >> >> > On 17 June 2010 20:20, Erik Hatcher 
> wrote:
> >> >> >>
> >> >> >> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
> >> >> >>>
> >> >> >>> p.s. I'd be glad to contribute our Maven build re-organization
> back
> >> to
> >> >> the
> >> >> >>> community to get Solr properly Mavenized so that it can be
> >> distributed
> >> >> and
> >> >> >>> released more often.  For us the benefit of this structure is
> that
> >> we
> >> >> will
> >> >> >>> be able to overlay addons such as RequestHandlers and other third
> >> party
> >> >> >>> support without having to rebuild Solr from scratch.
> >> >> >>
> >> >> >> But you don't have to rebuild Solr from scratch to add a new
> request
> >> >> handler
> >> >> >> or other plugins - simply compile your custom stuff into a JAR and
> >> put
> >> >> it in
> >> >> >> /lib (or point to it with  in solrconfig.xml).
> >> >> >>
> >> >> >>>  Ideally, a Maven Archetype could be created that would allow one
> >> >> rapidly
> >> >> >>> produce a Solr webapp and fire it up in Jetty in mere seconds.
> >> >> >>
> >> >> >> How's that any different than cd example; java -jar start.jar?  Or
> do
> >> >> you
> >> >> >> mean a Solr client webapp?
> >> >> >>
> >> >> >>> Finally, with projects such as Bobo, integration with Spring
> would
> >> make
> >> >> >>> configuration more consistent and request significantly less java
> >> >> coding
> >> >> >>> just to add new capabilities everytime someone authors a new
> >> >> RequestHandler.
> >> >> >>
> >> >> >> It's one line of config to add a new request handler.  How many
> >> >> ridiculously
> >> >> >> ugly confusing lines of Spring XML would it take?
> >> >> >>
> >> >> >>>  The biggest thing I learned about Solr in my work thusfar is
> that
> >> >> patches
> >> >> >>> like these could be standalone modules in separate projects if it
> >> >> weren't
> >> >> >>> for having to hack the configuration and solrj methods up to
> adopt
> >> >> them.
> >> >> >>>  Which brings me to SolrJ, great API if it would stay generic and
> >> have
> >> >> less
> >> >> >>> concern for adding method each time some custom collections an

example for searching hibernate entities

2010-06-22 Thread fachhoch

I have complex data model with   bi directional relations I  Use hibernate 
as ORM provider.so I have several model objects representing data model. All
together my model objetcs are 75 to 100  and my database  each table  has
several records like 20,000.
please suggest in my case will text search help me?
are there any example searching on hibernate entities?

  


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/example-for-searching-hibernate-entities-tp914279p914279.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance sorting multivalued field

2010-06-22 Thread Erick Erickson
Curiosity is good . Do be aware, though, that the behavior is not
guaranteed,
it's just "how things happen to work" and may change without warning

Erick

On Tue, Jun 22, 2010 at 4:01 AM, Marc Sturlese wrote:

>
> >>Well, sorting requires that all the unique values in the target field
> >>get loaded into memory
> That's what I tought, thanks.
>
> >>But a larger question is whether what your doing is worthwhile
> >>even as just a measurement. You say
> >>"This is good for me, I don't care for my tests". I claim that
> >>you do care
> I just like play with things. First checked the behavior of sorting on
> multiValued field and what I noticed was, let's say you have docs with
> field
> called 'num':
> doc1->num:2;doc2->num:1,num:4;doc3->num:5
> Sorting by the field num what you get is:
> After sorting asc I get: doc2,doc1,doc3.
> The behavior seems to be always the same (I am not saying it works like
> that
> but it's what I've seen in my examples)
> After seeing that I just decided to check the performance. The point is
> simply curiosity.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p913626.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Searching across multiple repeating fields

2010-06-22 Thread Geert-Jan Brits
Perhaps my answer is useless, bc I don't have an answer to your direct
question, but:
You *might* want to consider if your concept of a solr-document is on the
correct granular level, i.e:

your problem posted could be tackled (afaik) by defining a  document being a
'sub-event' with only 1 daterange.
So for each event-doc you have now, this is replaced by several sub-event
docs in this proposed situation.

Additionally each sub-event doc gets an additional field 'parent-eventid'
which maps to something like an event-id (which you're probably using) .
So several sub-event docs can point to the same event-id.

Lastly, all sub-event docs belonging to a particular event implement all the
other fields that you may have stored in that particular event-doc.

Now you can query for events based on data-rages like you envisioned, but
instead of returning events you return sub-event-docs. However since all
data of the original event (except the multiple dateranges) is available in
the subevent-doc this shouldn't really bother the client. If you need to
display all dates of an event (the only info missing from the returned
solr-doc) you could easily store it in a RDB and fetch it using the defined
parent-eventid.

The only caveat I see, is that possibly multiple sub-events with the same
'parent-eventid' might get returned for a particular query.
This however depends on the type of queries you envision. i.e:
1)  If you always issue queries with date-filters, and *assuming* that
sub-events of a particular event don't temporally overlap, you will never
get multiple sub-events returned.
2)  if 1)  doesn't hold and assuming you *do* mind multiple sub-events of
the same actual event, you could try to use Field Collapsing on
'parent-eventid' to only return the first sub-event per parent-eventid that
matches the rest of your query. (Note however, that Field Collapsing is a
patch at the moment. http://wiki.apache.org/solr/FieldCollapsing)

Not sure if this helped you at all, but at the very least it was a nice
conceptual exercise ;-)

Cheers,
Geert-Jan


2010/6/22 Mark Allan 

> Hi all,
>
> Firstly, I apologise for the length of this email but I need to describe
> properly what I'm doing before I get to the problem!
>
> I'm working on a project just now which requires the ability to store and
> search on temporal coverage data - ie. a field which specifies a date range
> during which a certain event took place.
>
> I hunted around for a few days and couldn't find anything which seemed to
> fit, so I had a go at writing my own field type based on solr.PointType.
>  It's used as follows:
>  schema.xml
> dimension="2" subFieldSuffix="_i"/>
> multiValued="true"/>
>  data.xml
>
>
>...
>1940,1945
>
>
>
> Internally, this gets stored as:
>1940,1945
>1940
>1945
>
> In due course, I'll declare the subfields as a proper date type, but in the
> meantime, this works absolutely fine.  I can search for an individual date
> and Solr will check (queryDate > daterange_0 AND queryDate < daterange_1 )
> and the correct documents are returned.  My code also allows the user to
> input a date range in the query but I won't complicate matters with that
> just now!
>
> The problem arises when a document has more than one "daterange" field
> (imagine a news broadcast which covers a variety of topics and hence time
> periods).
>
> A document with two daterange fields
>
>...
>19820402,19820614
>1990,2000
>
> gets stored internally as
> name="daterange">19820402,198206141990,2000
>198204021990
>198206142000
>
> In this situation, searching for 1985 should yield zero results as it is
> contained within neither daterange, however, the above document is returned
> in the result set.  What Solr is doing is checking that the queryDate (1985)
> is greater than *any* of the values in daterange_0 AND queryDate is less
> than *any* of the values in daterange_1.
>
> How can I get Solr to respect the positions of each item in the daterange_0
> and _1 arrays?  Ideally I'd like the search to use the following logic, thus
> preventing the above document from being returned in a search for 1985:
>(queryDate > daterange_0[0] AND queryDate < daterange_1[0]) OR
> (queryDate > daterange_0[1] AND queryDate < daterange_1[1])
>
> Someone else had a very similar problem recently on the mailing list with a
> multiValued PointType field but the thread went cold without a final
> solution.
>
> While I could filter the results when they get back to my application
> layer, it seems like it's not really the right place to do it.
>
> Any help getting Solr to respect the positions of items in arrays would be
> very gratefully received.
>
> Many thanks,
> Mark
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


RE: example for searching hibernate entities

2010-06-22 Thread Fornoville, Tom
Have you already looked at Hibernate Search?
It combines Hibernate ORM with indexing/searching functionality of
Lucene.
The latest version even comes with the Solr analyzers.

http://www.hibernate.org/subprojects/search.html

Regards,
Tom

-Original Message-
From: fachhoch [mailto:fachh...@gmail.com] 
Sent: dinsdag 22 juni 2010 16:23
To: solr-user@lucene.apache.org
Subject: example for searching hibernate entities


I have complex data model with   bi directional relations I  Use
hibernate 
as ORM provider.so I have several model objects representing data model.
All
together my model objetcs are 75 to 100  and my database  each table
has
several records like 20,000.
please suggest in my case will text search help me?
are there any example searching on hibernate entities?

  


-- 
View this message in context:
http://lucene.472066.n3.nabble.com/example-for-searching-hibernate-entit
ies-tp914279p914279.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OOM on sorting on dynamic fields

2010-06-22 Thread Erick Erickson
H.. A couple of details I'm wondering about. How many
documents are we talking about in your index? Do you get
OOMs when you start fresh or does it take a while?

You've done some good investigations, so it seems like there
could well be something else going on here than just "the usual
suspects" of sorting

I'm wondering if you aren't really closing readers somehow.
Are you updating your index frequently and re-opening readers often?
If so, how?

I'm assuming that if you do NOT sort on all these fields, you don't have
the problem, is that true?

Best
Erick

On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio <
matteo.fiande...@gmail.com> wrote:

> Hello,
> we are experiencing OOM exceptions in our single core solr instance
> (on a (huge) amazon EC2 machine).
> We investigated a lot in the mailing list and through jmap/jhat dump
> analyzing and the problem resides in the lucene FieldCache that fills
> the heap and blows up the server.
>
> Our index is quite small but we have a lot of sort queries  on fields
> that are dynamic,of type long representing timestamps and are not
> present in all the documents.
> Those queries apply sorting on 12-15 of those fields.
>
> We are using solr 1.4 in production and the dump shows a lot of
> Integer/Character and Byte Array filled up with 0s.
> With solr's trunk code things does not change.
>
> In the mailing list we saw a lot of messages related to this issues:
> we tried truncating the dates to day precision,using missingSortLast =
> true,changing the field type from slong to long,setting autowarming to
> different values,disabling and enabling caches with different values
> but we did not manage to solve the problem.
>
> We were thinking to implement an LRUFieldCache field type to manage
> the FieldCache as an LRU and preventing but, before starting a new
> development, we want to be sure that we are not doing anything wrong
> in the solr configuration or in the index generation.
>
> Any help would be appreciated.
> Regards,
> Matteo
>


Field missing when use distributed search + dismax

2010-06-22 Thread Scott Zhang
Hi. All.
   I was using distributed search over 30 solr instance, the previous one
was using the standard query handler. And the result was returned correctly.
each result has 2 fields. "ID" and "type".
   Today I want to use search withk dismax, I tried search with each
instance with dismax. It works correctly, return "ID" and "type" for each
result. The strange thing is when I
use distributed search, the result only have "ID". The field "type"
disappeared. I need that "type" to know what the "ID" refer to. Why solr
"eat" my "type"?


Thanks.
Regards.
Scott


Re: anyone use hadoop+solr?

2010-06-22 Thread Neeb

Hey James,

Just wondering if you ever had a chance to try out hadoop with solr? Would
appreciate any information/directions you could give.

I am particularly interested in indexing using a mapreduce job.

Cheers,
-Ali
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914450.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr with hadoop

2010-06-22 Thread Neeb

Hi,

We currently have a master-slave setup for solr with two slave servers. We
are using Solrj (stream-update-solr-server) to index master slave, which
takes 6 hours to index around 15 million documents.

I would like to explore hadoop, in particularly for indexing job using
mapreduce approach. 

- I have read some comments on the JIRA tickets, but it still seems unclear
how this setup will work. 
- I am not sure as what tasks will be done at map phase and what on reduce
phase. 
- And would it merge the multiple indices together into one during reduce
phase or is this a separate task out of mapreduce?

Any directions and guidance over this setup would be highly appreciated.

Thanks in advance,
-Ali
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914483.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ

2010-06-22 Thread Jan Høydahl / Cominvent
Hi,

Sometimes I do both. I put the defaults in solrconfig.xml and thus have one 
place to define all kind of low-level default settings.

But then I make a possibility in the application space to add/override any 
parameters as well. This gives you great flexibility to let server 
administrators (with access to solrconfig.xml) tune low level stuff, but also 
gives programmers a middle layer to put domain-space config instead of locking 
it down on the search node or up in the web interfaces.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 21. juni 2010, at 22.29, Saïd Radhouani wrote:

> I completely agreed. Thanks a lot!
> 
> -S
> 
> On Jun 21, 2010, at 9:08 PM, Abdelhamid ABID wrote:
> 
>> Why would someone port the solr config into servlet code  ?
>> IMO the first option would be the best choice, one obvious reason is that,
>> when alter the solr config you only need to restart the server, whereas
>> changing in the source drive you to redeploy your app and restart the
>> server.
>> 
>> 
>> 
>> On 6/21/10, Saïd Radhouani  wrote:
>>> 
>>> Hello,
>>> 
>>> I'm developing a Web application that communicate with Solr using SolrJ. I
>>> have three search interfaces, and I'm facing two options:
>>> 
>>> 1- Configuring one SearchHandler per search interface in solrconfig.xml
>>> 
>>> Or
>>> 
>>> 2- Write the configuration in the java servlet code that is using SolrJ
>>> 
>>> It there any significant difference between these two options ? If yes,
>>> what's the best choice?
>>> 
>>> Thanks,
>>> 
>>> -Saïd
>> 
>> 
>> 
>> 
>> -- 
>> Abdelhamid ABID
>> Software Engineer- J2EE / WEB
> 



Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ

2010-06-22 Thread Sven Maurmann

Hi,

there are reasons for both options. Usually it is a good idea to put the 
default
configuration into the solrconfig.xml (and even fix some of the 
configuration) in

order to have simple client-side code.

But sometimesit is necessary to have some flexibility for the actual query. 
In this
situation one would use the client-side approach. If done right, this does 
not mean

to put the parameters in the servlet code.

Cheers,
Sven

--On Dienstag, 22. Juni 2010 17:52 +0200 "Jan Høydahl / Cominvent" 
 wrote:



Hi,

Sometimes I do both. I put the defaults in solrconfig.xml and thus have
one place to define all kind of low-level default settings.

But then I make a possibility in the application space to add/override
any parameters as well. This gives you great flexibility to let server
administrators (with access to solrconfig.xml) tune low level stuff, but
also gives programmers a middle layer to put domain-space config instead
of locking it down on the search node or up in the web interfaces.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 21. juni 2010, at 22.29, Saïd Radhouani wrote:


I completely agreed. Thanks a lot!

-S

On Jun 21, 2010, at 9:08 PM, Abdelhamid ABID wrote:


Why would someone port the solr config into servlet code  ?
IMO the first option would be the best choice, one obvious reason is
that, when alter the solr config you only need to restart the server,
whereas changing in the source drive you to redeploy your app and
restart the server.



On 6/21/10, Saïd Radhouani  wrote:


Hello,

I'm developing a Web application that communicate with Solr using
SolrJ. I have three search interfaces, and I'm facing two options:

1- Configuring one SearchHandler per search interface in solrconfig.xml

Or

2- Write the configuration in the java servlet code that is using SolrJ

It there any significant difference between these two options ? If yes,
what's the best choice?

Thanks,

-Saïd





--
Abdelhamid ABID
Software Engineer- J2EE / WEB


Re: anyone use hadoop+solr?

2010-06-22 Thread Marc Sturlese

I think there's people using this patch in production:
https://issues.apache.org/jira/browse/SOLR-1301
I have tested it myself indexing data from CSV and from HBase and it works
properly
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914553.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr with hadoop

2010-06-22 Thread Marc Sturlese

I think a good solution could be to use hadoop with SOLR-1301 to build solr
shards and then use solr distributed search against these shards (you will
have to copy to local from HDFS to search against them)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914576.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-22 Thread Neeb

Thanks Marc,

Well I have an HBASE storage architecture and solr master-slave setup with
two slave servers.

Would this patch work with my setup? Do I need sharding in place? and what
tasks would be run at map and reduce phases? 

I was thinking something like:

At Map: read documents as key/value and convert it to solrInputDoc and add
it to the server.
At Reduce: merge index? and commit>optimioze?

Also is there any quick guidelines on how to get start with this setup? As I
am new to hadoop as well as fairly new to Solr.

Appreciate your help,
-A


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914587.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr with hadoop

2010-06-22 Thread MitchK

I wanted to add a Jira-issue about exactly what Otis is asking here.
Unfortunately, I haven't time for it because of my exams.

However, I'd like to add a question to Otis' ones:
If you destribute the indexing-progress this way, are you able to replicate
the different documents correctly?

Thank you.
- Mitch

Otis Gospodnetic-2 wrote:
> 
> Stu,
> 
> Interesting!  Can you provide more details about your setup?  By "load
> balance the indexing stage" you mean "distribute the indexing process",
> right?  Do you simply take your content to be indexed, split it into N
> chunks where N matches the number of TaskNodes in your Hadoop cluster and
> provide a map function that does the indexing?  What does the reduce
> function do?  Does that call IndexWriter.addAllIndexes or do you do that
> outside Hadoop?
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
> From: Stu Hood 
> To: solr-user@lucene.apache.org
> Sent: Monday, January 7, 2008 7:14:20 PM
> Subject: Re: solr with hadoop
> 
> As Mike suggested, we use Hadoop to organize our data en route to Solr.
>  Hadoop allows us to load balance the indexing stage, and then we use
>  the raw Lucene IndexWriter.addAllIndexes method to merge the data to be
>  hosted on Solr instances.
> 
> Thanks,
> Stu
> 
> 
> 
> -Original Message-
> From: Mike Klaas 
> Sent: Friday, January 4, 2008 3:04pm
> To: solr-user@lucene.apache.org
> Subject: Re: solr with hadoop
> 
> On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote:
> 
>> I have huge index base (about 110 millions documents, 100 fields  
>> each). But size of the index base is reasonable, it's about 70 Gb.  
>> All I need is increase performance, since some queries, which match  
>> big number of documents, are running slow.
>> So I was thinking is any benefits to use hadoop for this? And if  
>> so, what direction should I go? Is anybody did something for  
>> integration Solr with Hadoop? Does it give any performance boost?
>>
> Hadoop might be useful for organizing your data enroute to Solr, but  
> I don't see how it could be used to boost performance over a huge  
> Solr index.  To accomplish that, you need to split it up over two  
> machines (for which you might find hadoop useful).
> 
> -Mike
> 
> 
> 
> 
> 
> 
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-22 Thread Blargy

Need, 

Seems like we are in the same boat. Our index consist of 5M records which
roughly equals around 30 gigs. All in all thats not too bad however our
indexing process (we use DIH but I'm now revisiting that idea) takes a
whopping 30+ hours!!!

I just bought the Hadoop In Action early edition but haven't had time to
read it yet. I was wondering what resources you are using to learn Hadoop
and more importantly its applications to Solr. Would you mind explaining
your thought process on how you will be using Hadoop in more detail? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914606.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data Import Handler Rich Format Documents

2010-06-22 Thread Tod

On 6/18/2010 2:42 PM, Chris Hostetter wrote:

: > I don't think DIH can do that, but who knows, let's see what others say.

: Looks like the ExtractingRequestHandler uses Tika as well.  I might just use
: this but I'm wondering if there will be a large performance difference between
: using it to batch content in over rolling my own Transformer?

I'm confused ... You're using DIH, and some of your fields are URLs to 
documents that you want to parse with Tika?


Why would you need a custom Transformer?

http://wiki.apache.org/solr/DataImportHandler#Tika_Integration
http://wiki.apache.org/solr/TikaEntityProcessor

-Hoss


Ok, I'm trying to integrate the TikaEntityProcessor as suggested.  I'm 
using Solr Version: 1.4.0 and getting the following error:


java.lang.ClassNotFoundException: Unable to load BinURLDataSource or 
org.apache.solr.handler.dataimport.BinURLDataSource


curl -s http://test.html|curl 
http://localhost:9080/solr/update/extract?extractOnly=true --data-binary 
@-  -H 'Content-type:text/html'


... works fine so presumably my Tika processor is working.


My data-config.xml looks like this:


  

  

  

  
  
  
  
  
  
  


 query="select CONTENT_URL from my_database where 
content_id='${my_database.CONTENT_ID}'">

 
  url="http://www.mysite.com/${my_database.content_url}";
  
 


  


I added the entity name="my_database_url" section to an existing 
(working) database entity to be able to have Tika index the content 
pointed to by the content_url.


Is there anything obviously wrong with what I've tried so far?


Thanks - Tod


Re: anyone use hadoop+solr?

2010-06-22 Thread Marc Sturlese

Well, the patch consumes the data from a csv. You have to modify the input to
use TableInputFormat (I don't remember if it's called exaclty like that) and
it will work.
Once you've done that, you have to specify as much reducers as shards you
want.

I know 2 ways to index using hadoop
method 1 (solr-1301 & nutch):
-Map: just get data from the source and create key-value
-Reduce: does the analysis and index the data
So, the index is build on the reducer side

method 2 (hadoop lucene index contrib)
-Map: does analysis and open indexWriter to add docs
-Reducer: Merge small indexs build in the map
So, indexs are build on the map side
method 2 has no good integration with Solr at the moment.

In the jira (SOLR-1301) there's a good explanation of the advantages and
disadvantages of indexing on the map or reduce side. I recomend you to read
with detail all the comments on the jira to know exactly how it works.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-22 Thread Muneeb Ali

Hi Blargy,

Nice to hear that I am not alone ;) 

Well we have been using Hadoop for other data-intensive services, those that
can be done in parallel. We have multiple nodes, which are used by Hadoop
for all our MapReduce jobs. I personally don't have much experience with its
use and hence wouldn't be able to help you much with that.

Our indexing takes 6+ hours to index 15 million documents (using
solrj.streamUpdateSolrServer). I wanted to explore hadoop for this task, as
it can be done in parallel.

I have just started investigating into this, will keep this post updated if
found anything helpful.
 
-Neeb 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914659.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OOM on sorting on dynamic fields

2010-06-22 Thread Matteo Fiandesio
Hi Erick,
the index is quite small (1691145 docs) but sorting is massive and
often on unique timestamp fields.

OOM occur after a range of time between three and four hours.
Depending as well if users browse a part of the application.

We use solrj to make the queries so we did not use Readers objects directly.

Without sorting we don't see the problem
Regards,
Matteo

On 22 June 2010 17:01, Erick Erickson  wrote:
> H.. A couple of details I'm wondering about. How many
> documents are we talking about in your index? Do you get
> OOMs when you start fresh or does it take a while?
>
> You've done some good investigations, so it seems like there
> could well be something else going on here than just "the usual
> suspects" of sorting
>
> I'm wondering if you aren't really closing readers somehow.
> Are you updating your index frequently and re-opening readers often?
> If so, how?
>
> I'm assuming that if you do NOT sort on all these fields, you don't have
> the problem, is that true?
>
> Best
> Erick
>
> On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio <
> matteo.fiande...@gmail.com> wrote:
>
>> Hello,
>> we are experiencing OOM exceptions in our single core solr instance
>> (on a (huge) amazon EC2 machine).
>> We investigated a lot in the mailing list and through jmap/jhat dump
>> analyzing and the problem resides in the lucene FieldCache that fills
>> the heap and blows up the server.
>>
>> Our index is quite small but we have a lot of sort queries  on fields
>> that are dynamic,of type long representing timestamps and are not
>> present in all the documents.
>> Those queries apply sorting on 12-15 of those fields.
>>
>> We are using solr 1.4 in production and the dump shows a lot of
>> Integer/Character and Byte Array filled up with 0s.
>> With solr's trunk code things does not change.
>>
>> In the mailing list we saw a lot of messages related to this issues:
>> we tried truncating the dates to day precision,using missingSortLast =
>> true,changing the field type from slong to long,setting autowarming to
>> different values,disabling and enabling caches with different values
>> but we did not manage to solve the problem.
>>
>> We were thinking to implement an LRUFieldCache field type to manage
>> the FieldCache as an LRU and preventing but, before starting a new
>> development, we want to be sure that we are not doing anything wrong
>> in the solr configuration or in the index generation.
>>
>> Any help would be appreciated.
>> Regards,
>> Matteo
>>
>


Re: solr with hadoop

2010-06-22 Thread Jon Baer
I was playing around w/ Sqoop the other day, its a simple Cloudera tool for 
imports (mysql -> hdfs) @ http://www.cloudera.com/developers/downloads/sqoop/

It seems to me (it would be pretty efficient) to dump to HDFS and have 
something like Data Import Handler be able to read from hdfs:// directly ...

Has this route been discussed / developed before (ie DIH w/ hdfs:// handler)?

- Jon

On Jun 22, 2010, at 12:29 PM, MitchK wrote:

> 
> I wanted to add a Jira-issue about exactly what Otis is asking here.
> Unfortunately, I haven't time for it because of my exams.
> 
> However, I'd like to add a question to Otis' ones:
> If you destribute the indexing-progress this way, are you able to replicate
> the different documents correctly?
> 
> Thank you.
> - Mitch
> 
> Otis Gospodnetic-2 wrote:
>> 
>> Stu,
>> 
>> Interesting!  Can you provide more details about your setup?  By "load
>> balance the indexing stage" you mean "distribute the indexing process",
>> right?  Do you simply take your content to be indexed, split it into N
>> chunks where N matches the number of TaskNodes in your Hadoop cluster and
>> provide a map function that does the indexing?  What does the reduce
>> function do?  Does that call IndexWriter.addAllIndexes or do you do that
>> outside Hadoop?
>> 
>> Thanks,
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> 
>> - Original Message 
>> From: Stu Hood 
>> To: solr-user@lucene.apache.org
>> Sent: Monday, January 7, 2008 7:14:20 PM
>> Subject: Re: solr with hadoop
>> 
>> As Mike suggested, we use Hadoop to organize our data en route to Solr.
>> Hadoop allows us to load balance the indexing stage, and then we use
>> the raw Lucene IndexWriter.addAllIndexes method to merge the data to be
>> hosted on Solr instances.
>> 
>> Thanks,
>> Stu
>> 
>> 
>> 
>> -Original Message-
>> From: Mike Klaas 
>> Sent: Friday, January 4, 2008 3:04pm
>> To: solr-user@lucene.apache.org
>> Subject: Re: solr with hadoop
>> 
>> On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote:
>> 
>>> I have huge index base (about 110 millions documents, 100 fields  
>>> each). But size of the index base is reasonable, it's about 70 Gb.  
>>> All I need is increase performance, since some queries, which match  
>>> big number of documents, are running slow.
>>> So I was thinking is any benefits to use hadoop for this? And if  
>>> so, what direction should I go? Is anybody did something for  
>>> integration Solr with Hadoop? Does it give any performance boost?
>>> 
>> Hadoop might be useful for organizing your data enroute to Solr, but  
>> I don't see how it could be used to boost performance over a huge  
>> Solr index.  To accomplish that, you need to split it up over two  
>> machines (for which you might find hadoop useful).
>> 
>> -Mike
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: OOM on sorting on dynamic fields

2010-06-22 Thread Erick Erickson
Hmmm, I'm missing something here then. Sorting over 15 fields of type long
shouldn't use much memory, even if all the values are unique. When you say
"12-15 dynamic fields", are you talking about 12-15 fields per query out of
XXX total fields? And is XXX large? At a guess, how many different fields
do
you think you're sorting over cumulative by the time you get your OOM?
Note if you sort over the field "erick_time" in 10 different queries, I'm
only counting that as 1 field. I guess another way of asking this is
"how many dynamic fields are there total?".

If this is really a sorting issue, you should be able to force this to
happen
almost immediately by firing off enough sort queries at the server. It'll
tell you a lot if you can't make this happen, even on a relatively small
test machine.

Best
Erick

On Tue, Jun 22, 2010 at 12:59 PM, Matteo Fiandesio <
matteo.fiande...@gmail.com> wrote:

> Hi Erick,
> the index is quite small (1691145 docs) but sorting is massive and
> often on unique timestamp fields.
>
> OOM occur after a range of time between three and four hours.
> Depending as well if users browse a part of the application.
>
> We use solrj to make the queries so we did not use Readers objects
> directly.
>
> Without sorting we don't see the problem
> Regards,
> Matteo
>
> On 22 June 2010 17:01, Erick Erickson  wrote:
> > H.. A couple of details I'm wondering about. How many
> > documents are we talking about in your index? Do you get
> > OOMs when you start fresh or does it take a while?
> >
> > You've done some good investigations, so it seems like there
> > could well be something else going on here than just "the usual
> > suspects" of sorting
> >
> > I'm wondering if you aren't really closing readers somehow.
> > Are you updating your index frequently and re-opening readers often?
> > If so, how?
> >
> > I'm assuming that if you do NOT sort on all these fields, you don't have
> > the problem, is that true?
> >
> > Best
> > Erick
> >
> > On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio <
> > matteo.fiande...@gmail.com> wrote:
> >
> >> Hello,
> >> we are experiencing OOM exceptions in our single core solr instance
> >> (on a (huge) amazon EC2 machine).
> >> We investigated a lot in the mailing list and through jmap/jhat dump
> >> analyzing and the problem resides in the lucene FieldCache that fills
> >> the heap and blows up the server.
> >>
> >> Our index is quite small but we have a lot of sort queries  on fields
> >> that are dynamic,of type long representing timestamps and are not
> >> present in all the documents.
> >> Those queries apply sorting on 12-15 of those fields.
> >>
> >> We are using solr 1.4 in production and the dump shows a lot of
> >> Integer/Character and Byte Array filled up with 0s.
> >> With solr's trunk code things does not change.
> >>
> >> In the mailing list we saw a lot of messages related to this issues:
> >> we tried truncating the dates to day precision,using missingSortLast =
> >> true,changing the field type from slong to long,setting autowarming to
> >> different values,disabling and enabling caches with different values
> >> but we did not manage to solve the problem.
> >>
> >> We were thinking to implement an LRUFieldCache field type to manage
> >> the FieldCache as an LRU and preventing but, before starting a new
> >> development, we want to be sure that we are not doing anything wrong
> >> in the solr configuration or in the index generation.
> >>
> >> Any help would be appreciated.
> >> Regards,
> >> Matteo
> >>
> >
>


Change the Solr searcher

2010-06-22 Thread sarfaraz masood
I am a novice in solr / lucene. but i have gone
thru the documentations of both.I have even implemented programs in
lucene for searching etc.

My problem is to apply a new search technique other than the one used by solr. 

Now as i know that lucene has its own searcher
which is used by solr as well. 

*Ques.. Cant i replace this searcher part in
SOLR by a java program that returns documents as per my algorithm ?

i.e I only want to change the searcher part of solr. I have
studied abt customizing the scoring which is absolutely not my aim.My
aim is replace the searcher.

Plz help me in this regards. I will be highly gratefull to you for your 
assistance in this work of mine.

If any part of this mail was not clear to you then plz lemme know, i will 
expain that you.

Regards

-sarfaraz



Re: Change the Solr searcher

2010-06-22 Thread Erik Hatcher
Sounds like what you want is to override Solr's "query" component.   
Have a look at the built-in one and go from there.


Erik

On Jun 22, 2010, at 1:38 PM, sarfaraz masood wrote:


I am a novice in solr / lucene. but i have gone
thru the documentations of both.I have even implemented programs in
lucene for searching etc.

My problem is to apply a new search technique other than the one  
used by solr.


Now as i know that lucene has its own searcher
which is used by solr as well.

*Ques.. Cant i replace this searcher part in
SOLR by a java program that returns documents as per my algorithm ?

i.e I only want to change the searcher part of solr. I have
studied abt customizing the scoring which is absolutely not my aim.My
aim is replace the searcher.

Plz help me in this regards. I will be highly gratefull to you for  
your assistance in this work of mine.


If any part of this mail was not clear to you then plz lemme know, i  
will expain that you.


Regards

-sarfaraz





Performance related question on DISMAX handler..

2010-06-22 Thread bbarani

Hi,

I just want to know if there will be any overhead / performance degradation
if I use the Dismax search handler instead of standard search handler?

We are planning to index millions of documents and not sure if using Dismax
will slow down the search performance. Would be great if someone can share
their thoughts.

Thanks,
BB
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-related-question-on-DISMAX-handler-tp914892p914892.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-22 Thread Jason Rutherglen
We (Attensity Group) have been using SOLR-1301 for 6+ months now
because we have a ready Hadoop cluster and need to be able to re/index
up to 3 billion docs.  I read the various emails and wasn't sure what
you're asking.

Cheers...

On Tue, Jun 22, 2010 at 8:27 AM, Neeb  wrote:
>
> Hey James,
>
> Just wondering if you ever had a chance to try out hadoop with solr? Would
> appreciate any information/directions you could give.
>
> I am particularly interested in indexing using a mapreduce job.
>
> Cheers,
> -Ali
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914450.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: example for searching hibernate entities

2010-06-22 Thread Peter Karich
as always: it depends.
take a look into hibernate search also, which is lucene powered.

Peter.

> I have complex data model with   bi directional relations I  Use hibernate 
> as ORM provider.so I have several model objects representing data model. All
> together my model objetcs are 75 to 100  and my database  each table  has
> several records like 20,000.
> please suggest in my case will text search help me?
> are there any example searching on hibernate entities?
>
>   
>
>
>   



Re: anyone use hadoop+solr?

2010-06-22 Thread Blargy


Muneeb Ali wrote:
> 
> Hi Blargy,
> 
> Nice to hear that I am not alone ;) 
> 
> Well we have been using Hadoop for other data-intensive services, those
> that can be done in parallel. We have multiple nodes, which are used by
> Hadoop for all our MapReduce jobs. I personally don't have much experience
> with its use and hence wouldn't be able to help you much with that.
> 
> Our indexing takes 6+ hours to index 15 million documents (using
> solrj.streamUpdateSolrServer). I wanted to explore hadoop for this task,
> as it can be done in parallel.
> 
> I have just started investigating into this, will keep this post updated
> if found anything helpful.
>  
> -Neeb 
> 

Would you mind explaining how your full indexing strategy is implemented
using the StreamingUpdateSolrServer? I am currently only familar with using
the DataImportHandler. Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p915227.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OOM on sorting on dynamic fields

2010-06-22 Thread Matteo Fiandesio
Fields over i'm sorting to are dynamic so one query sorts on
erick_time_1,erick_timeA_1 and other sorts on erick_time_2 and so
on.What we see in the heap are a lot of arrays,most of them,filled
with 0s maybe due to the fact that this timestamps fields are not
present in all the documents.

By the way,
I have a script that generates the OOM in 10 minutes on our solr
instance and with the temporary patch it runned without any problems.
The side effect is that when the cache is purged next query that
regenerates the  cache is a little bit slower.

I'm aware that the solution is unelegant and we are investigating to
solve the problem in another way.
Regards,
Matteo


On 22 June 2010 19:25, Erick Erickson  wrote:
> Hmmm, I'm missing something here then. Sorting over 15 fields of type long
> shouldn't use much memory, even if all the values are unique. When you say
> "12-15 dynamic fields", are you talking about 12-15 fields per query out of
> XXX total fields? And is XXX large? At a guess, how many different fields
> do
> you think you're sorting over cumulative by the time you get your OOM?
> Note if you sort over the field "erick_time" in 10 different queries, I'm
> only counting that as 1 field. I guess another way of asking this is
> "how many dynamic fields are there total?".
>
> If this is really a sorting issue, you should be able to force this to
> happen
> almost immediately by firing off enough sort queries at the server. It'll
> tell you a lot if you can't make this happen, even on a relatively small
> test machine.
>
> Best
> Erick
>
> On Tue, Jun 22, 2010 at 12:59 PM, Matteo Fiandesio <
> matteo.fiande...@gmail.com> wrote:
>
>> Hi Erick,
>> the index is quite small (1691145 docs) but sorting is massive and
>> often on unique timestamp fields.
>>
>> OOM occur after a range of time between three and four hours.
>> Depending as well if users browse a part of the application.
>>
>> We use solrj to make the queries so we did not use Readers objects
>> directly.
>>
>> Without sorting we don't see the problem
>> Regards,
>> Matteo
>>
>> On 22 June 2010 17:01, Erick Erickson  wrote:
>> > H.. A couple of details I'm wondering about. How many
>> > documents are we talking about in your index? Do you get
>> > OOMs when you start fresh or does it take a while?
>> >
>> > You've done some good investigations, so it seems like there
>> > could well be something else going on here than just "the usual
>> > suspects" of sorting
>> >
>> > I'm wondering if you aren't really closing readers somehow.
>> > Are you updating your index frequently and re-opening readers often?
>> > If so, how?
>> >
>> > I'm assuming that if you do NOT sort on all these fields, you don't have
>> > the problem, is that true?
>> >
>> > Best
>> > Erick
>> >
>> > On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio <
>> > matteo.fiande...@gmail.com> wrote:
>> >
>> >> Hello,
>> >> we are experiencing OOM exceptions in our single core solr instance
>> >> (on a (huge) amazon EC2 machine).
>> >> We investigated a lot in the mailing list and through jmap/jhat dump
>> >> analyzing and the problem resides in the lucene FieldCache that fills
>> >> the heap and blows up the server.
>> >>
>> >> Our index is quite small but we have a lot of sort queries  on fields
>> >> that are dynamic,of type long representing timestamps and are not
>> >> present in all the documents.
>> >> Those queries apply sorting on 12-15 of those fields.
>> >>
>> >> We are using solr 1.4 in production and the dump shows a lot of
>> >> Integer/Character and Byte Array filled up with 0s.
>> >> With solr's trunk code things does not change.
>> >>
>> >> In the mailing list we saw a lot of messages related to this issues:
>> >> we tried truncating the dates to day precision,using missingSortLast =
>> >> true,changing the field type from slong to long,setting autowarming to
>> >> different values,disabling and enabling caches with different values
>> >> but we did not manage to solve the problem.
>> >>
>> >> We were thinking to implement an LRUFieldCache field type to manage
>> >> the FieldCache as an LRU and preventing but, before starting a new
>> >> development, we want to be sure that we are not doing anything wrong
>> >> in the solr configuration or in the index generation.
>> >>
>> >> Any help would be appreciated.
>> >> Regards,
>> >> Matteo
>> >>
>> >
>>
>


Help with highlighting

2010-06-22 Thread noel
Hi, I need help with highlighting fields that would match a query. So far, my 
results only highlight if the field is from all_text, and I would like it to 
use other fields. It simply isn't the case if I just turn highlighting on. Any 
ideas why it only applies to all_text? Here is my schema:




















































 































































unique_key   
all_text








Re: Field Collapsing SOLR-236

2010-06-22 Thread Martijn v Groningen
What exactly did not work? Patching, compiling or running it?

On 22 June 2010 16:06, Rakhi Khatwani  wrote:
> Hi,
>      I tried checking out the latest code (rev 956715) the patch did not
> work on it.
> Infact i even tried hunting for the revision mentioned earlier in this
> thread (i.e. rev 955615) but cannot find it in the repository. (it has
> revision 955569 followed by revision 955785).
>
> Any pointers??
> Regards
> Raakhi
>
> On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen <
> martijn.is.h...@gmail.com> wrote:
>
>> Oh in that case is the code stable enough to use it for production?
>>     -  Well this feature is a patch and I think that says it all.
>> Although bugs are fixed it is deferentially an experimental feature
>> and people should keep that in mind when using one of the patches.
>> Does it support features which solr 1.4 normally supports?
>>    - As far as I know yes.
>>
>> am using facets as a workaround but then i am not able to sort on any
>> other field. is there any workaround to support this feature??
>>    - Maybee http://wiki.apache.org/solr/Deduplication prevents from
>> adding duplicates in you index, but then you miss the collapse counts
>> and other computed values
>>
>> On 21 June 2010 09:04, Rakhi Khatwani  wrote:
>> > Hi,
>> >    Oh in that case is the code stable enough to use it for production?
>> > Does it support features which solr 1.4 normally supports?
>> >
>> > I am using facets as a workaround but then i am not able to sort on any
>> > other field. is there any workaround to support this feature??
>> >
>> > Regards,
>> > Raakhi
>> >
>> > On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen <
>> > martijn.is.h...@gmail.com> wrote:
>> >
>> >> Hi Rakhi,
>> >>
>> >> The patch is not compatible with 1.4. If you want to work with the
>> >> trunk. I'll need to get the src from
>> >> https://svn.apache.org/repos/asf/lucene/dev/trunk/
>> >>
>> >> Martijn
>> >>
>> >> On 18 June 2010 13:46, Rakhi Khatwani  wrote:
>> >> > Hi Moazzam,
>> >> >
>> >> >                  Where did u get the src code from??
>> >> >
>> >> > I am downloading it from
>> >> > https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
>> >> >
>> >> > and the latest revision in this location is 955469.
>> >> >
>> >> > so applying the latest patch(dated 17th june 2010) on it still
>> generates
>> >> > errors.
>> >> >
>> >> > Any Pointers?
>> >> >
>> >> > Regards,
>> >> > Raakhi
>> >> >
>> >> >
>> >> > On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan 
>> >> wrote:
>> >> >
>> >> >> I knew it wasn't me! :)
>> >> >>
>> >> >> I found the patch just before I read this and applied it to the trunk
>> >> >> and it works!
>> >> >>
>> >> >> Thanks Mark and martijn for all your help!
>> >> >>
>> >> >> - Moazzam
>> >> >>
>> >> >> On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen
>> >> >>  wrote:
>> >> >> > I've added a new patch to the issue, so building the trunk (rev
>> >> >> > 955615) with the latest patch should not be a problem. Due to
>> recent
>> >> >> > changes in the Lucene trunk the patch was not compatible.
>> >> >> >
>> >> >> > On 17 June 2010 20:20, Erik Hatcher 
>> wrote:
>> >> >> >>
>> >> >> >> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
>> >> >> >>>
>> >> >> >>> p.s. I'd be glad to contribute our Maven build re-organization
>> back
>> >> to
>> >> >> the
>> >> >> >>> community to get Solr properly Mavenized so that it can be
>> >> distributed
>> >> >> and
>> >> >> >>> released more often.  For us the benefit of this structure is
>> that
>> >> we
>> >> >> will
>> >> >> >>> be able to overlay addons such as RequestHandlers and other third
>> >> party
>> >> >> >>> support without having to rebuild Solr from scratch.
>> >> >> >>
>> >> >> >> But you don't have to rebuild Solr from scratch to add a new
>> request
>> >> >> handler
>> >> >> >> or other plugins - simply compile your custom stuff into a JAR and
>> >> put
>> >> >> it in
>> >> >> >> /lib (or point to it with  in solrconfig.xml).
>> >> >> >>
>> >> >> >>>  Ideally, a Maven Archetype could be created that would allow one
>> >> >> rapidly
>> >> >> >>> produce a Solr webapp and fire it up in Jetty in mere seconds.
>> >> >> >>
>> >> >> >> How's that any different than cd example; java -jar start.jar?  Or
>> do
>> >> >> you
>> >> >> >> mean a Solr client webapp?
>> >> >> >>
>> >> >> >>> Finally, with projects such as Bobo, integration with Spring
>> would
>> >> make
>> >> >> >>> configuration more consistent and request significantly less java
>> >> >> coding
>> >> >> >>> just to add new capabilities everytime someone authors a new
>> >> >> RequestHandler.
>> >> >> >>
>> >> >> >> It's one line of config to add a new request handler.  How many
>> >> >> ridiculously
>> >> >> >> ugly confusing lines of Spring XML would it take?
>> >> >> >>
>> >> >> >>>  The biggest thing I learned about Solr in my work thusfar is
>> that
>> >> >> patches
>> >> >> >>> like these could be standalone modules in separate projects if it
>> >> >> weren't
>> >> >> 

Re: collapse exception

2010-06-22 Thread Martijn v Groningen
I checked your stacktrace and I can't remember putting
SolrIndexSearcher.getDocListAndSet(...) in the doQuery(...) method. I
guess the patch was modified before it was applied.
I think the error occurs when you do a field collapse search with a fq
parameter. That is the only reason I can think of why this exception
is thrown.

When this component become a contrib? Using patch is so annoying
Patching is a bit of a hassle. This patch has some changes in the
SolrIndexSearcher which makes it difficult to make it a contrib or an
extension.

On 22 June 2010 04:52, Li Li  wrote:
> I don't know because it's patched by someone else but I can't get his
> help. When this component become a contrib? Using patch is so annoying
>
> 2010/6/22 Martijn v Groningen :
>> What version of Solr and which patch are you using?
>>
>> On 21 June 2010 11:46, Li Li  wrote:
>>> it says  "Either filter or filterList may be set in the QueryCommand,
>>> but not both." I am newbie of solr and have no idea of the exception.
>>> What's wrong with it? thank you.
>>>
>>> java.lang.IllegalArgumentException: Either filter or filterList may be
>>> set in the QueryCommand, but not both.
>>>        at 
>>> org.apache.solr.search.SolrIndexSearcher$QueryCommand.setFilter(SolrIndexSearcher.java:1711)
>>>        at 
>>> org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1286)
>>>        at 
>>> org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:205)
>>>        at 
>>> org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:246)
>>>        at 
>>> org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:173)
>>>        at 
>>> org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:174)
>>>        at 
>>> org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
>>>        at 
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
>>>        at 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>>        at 
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>>        at 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>>        at 
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>        at 
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>        at 
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>        at 
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>        at 
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>        at 
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>        at 
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>        at 
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>>>        at 
>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
>>>        at 
>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>>        at 
>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
>>>        at java.lang.Thread.run(Thread.java:619)
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>



-- 
Met vriendelijke groet,

Martijn van Groningen


SOLR partial string matching question

2010-06-22 Thread Vladimir Sutskever
Hi,

Can you guys make a recommendation for which types/filters to use accomplish 
the following partial keyword match:


A. Actual Indexed Term:  "bank of america"

B. User Enters Search Term:  "of ameri"


I would like SOLR to match document "bank of america" with the partial string 
"of ameri"

Any suggestions?



Kind regards,

Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

Re: SOLR partial string matching question

2010-06-22 Thread Joe Calderon
you want a combination of WhitespaceTokenizer and EdgeNGramFilter
http://lucene.apache.org/solr/api/org/apache/solr/analysis/WhitespaceTokenizerFactory.html
http://lucene.apache.org/solr/api/org/apache/solr/analysis/EdgeNGramFilterFactory.html

the first will create tokens for each word the second will create
multiple tokens from each word prefix

use the analysis link from the admin page to test your filter chain
and make sure its doing what you want.


On Tue, Jun 22, 2010 at 4:06 PM, Vladimir Sutskever
 wrote:
> Hi,
>
> Can you guys make a recommendation for which types/filters to use accomplish 
> the following partial keyword match:
>
>
> A. Actual Indexed Term:  "bank of america"
>
> B. User Enters Search Term:  "of ameri"
>
>
> I would like SOLR to match document "bank of america" with the partial string 
> "of ameri"
>
> Any suggestions?
>
>
>
> Kind regards,
>
> Vladimir Sutskever
> Investment Bank - Technology
> JPMorgan Chase, Inc.
>
>
>
> This email is confidential and subject to important disclaimers and
> conditions including on offers for the purchase or sale of
> securities, accuracy and completeness of information, viruses,
> confidentiality, legal privilege, and legal entity disclaimers,
> available at http://www.jpmorgan.com/pages/disclosures/email.


Re: collapse exception

2010-06-22 Thread Erik Hatcher
Martijn - Maybe the patches to SolrIndexSearcher could be extracted  
into a new issue so that we can put in the infrastructure at least.   
That way this could truly be a drop-in plugin without it actually  
being in core.  I haven't looked at the specifics, but I imagine we  
could get the core stuff adjusted to suit this plugin.


Erik

On Jun 22, 2010, at 5:24 PM, Martijn v Groningen wrote:


I checked your stacktrace and I can't remember putting
SolrIndexSearcher.getDocListAndSet(...) in the doQuery(...) method. I
guess the patch was modified before it was applied.
I think the error occurs when you do a field collapse search with a fq
parameter. That is the only reason I can think of why this exception
is thrown.

When this component become a contrib? Using patch is so annoying
Patching is a bit of a hassle. This patch has some changes in the
SolrIndexSearcher which makes it difficult to make it a contrib or an
extension.

On 22 June 2010 04:52, Li Li  wrote:

I don't know because it's patched by someone else but I can't get his
help. When this component become a contrib? Using patch is so  
annoying


2010/6/22 Martijn v Groningen :

What version of Solr and which patch are you using?

On 21 June 2010 11:46, Li Li  wrote:
it says  "Either filter or filterList may be set in the  
QueryCommand,
but not both." I am newbie of solr and have no idea of the  
exception.

What's wrong with it? thank you.

java.lang.IllegalArgumentException: Either filter or filterList  
may be

set in the QueryCommand, but not both.
   at org.apache.solr.search.SolrIndexSearcher 
$QueryCommand.setFilter(SolrIndexSearcher.java:1711)
   at  
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java: 
1286)
   at  
org 
.apache 
.solr 
.search 
.fieldcollapse 
.NonAdjacentDocumentCollapser 
.doQuery(NonAdjacentDocumentCollapser.java:205)
   at  
org 
.apache 
.solr 
.search 
.fieldcollapse 
.AbstractDocumentCollapser 
.executeCollapse(AbstractDocumentCollapser.java:246)
   at  
org 
.apache 
.solr 
.search 
.fieldcollapse 
.AbstractDocumentCollapser 
.collapse(AbstractDocumentCollapser.java:173)
   at  
org 
.apache 
.solr 
.handler 
.component.CollapseComponent.doProcess(CollapseComponent.java:174)
   at  
org 
.apache 
.solr 
.handler 
.component.CollapseComponent.process(CollapseComponent.java:127)
   at  
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
   at  
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
338)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
241)
   at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:235)
   at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
206)
   at  
org 
.apache 
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at  
org 
.apache 
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at  
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at  
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at  
org 
.apache 
.catalina 
.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at  
org 
.apache 
.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
   at  
org 
.apache 
.coyote.http11.Http11Processor.process(Http11Processor.java:849)
   at org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at org.apache.tomcat.util.net.JIoEndpoint 
$Worker.run(JIoEndpoint.java:454)

   at java.lang.Thread.run(Thread.java:619)





--
Met vriendelijke groet,

Martijn van Groningen







--
Met vriendelijke groet,

Martijn van Groningen




Re: Help with highlighting

2010-06-22 Thread Erik Hatcher
You need to share with us the Solr request you made, any any custom  
request handler settings that might map to.  Chances are you just need  
to twiddle with the highlighter parameters (see wiki for docs) to get  
it to do what you want.


Erik

On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote:

Hi, I need help with highlighting fields that would match a query.  
So far, my results only highlight if the field is from all_text, and  
I would like it to use other fields. It simply isn't the case if I  
just turn highlighting on. Any ideas why it only applies to  
all_text? Here is my schema:









		sortMissingLast="true" omitNorms="true" />
		sortMissingLast="true" omitNorms="true" />










		sortMissingLast="true" omitNorms="true"/>
		sortMissingLast="true" omitNorms="true"/>
		sortMissingLast="true" omitNorms="true"/>
		sortMissingLast="true" omitNorms="true"/>




		sortMissingLast="true" omitNorms="true"/>



		indexed="true" />



		positionIncrementGap="100">







		positionIncrementGap="100">



   		
generateWordParts="1" generateNumberParts="1" catenateWords="1"  
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>


protected="protwords.txt"/>






synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0"  
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>


protected="protwords.txt"/>






		positionIncrementGap="100" >



synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
words="stopwords.txt"/>
generateWordParts="0" generateNumberParts="0" catenateWords="1"  
catenateNumbers="1" catenateAll="0"/>


protected="protwords.txt"/>






		positionIncrementGap="100" >





outputUnigrams="false" />





		sortMissingLast="true" omitNorms="true">









		class="solr.StrField" />







		stored="true" />



		multiValued="true" />


		stored="true" />
		stored="true" />





		allowDups="true" multiValued="true" />


		stored="true" allowDups="true" />




unique_key   
all_text










Re: Field missing when use distributed search + dismax

2010-06-22 Thread Lance Norskog
Do all of the Solr instances, including the broker, use the same schema.xml?

On 6/22/10, Scott Zhang  wrote:
> Hi. All.
>I was using distributed search over 30 solr instance, the previous one
> was using the standard query handler. And the result was returned correctly.
> each result has 2 fields. "ID" and "type".
>Today I want to use search withk dismax, I tried search with each
> instance with dismax. It works correctly, return "ID" and "type" for each
> result. The strange thing is when I
> use distributed search, the result only have "ID". The field "type"
> disappeared. I need that "type" to know what the "ID" refer to. Why solr
> "eat" my "type"?
>
>
> Thanks.
> Regards.
> Scott
>


-- 
Lance Norskog
goks...@gmail.com


about function query

2010-06-22 Thread Li Li
I want to integrate document's timestamp into scoring of search. And I
find an example in the book "Solr 1.4 Enterprise Search Server" about
function query. I want to boost a document which is newer. so it may
be a function such as 1/(timestamp+1) . But the function query is
added to the final result, not multiplied. So I can't adjust the
parameter well.
e.g
search term is term1, topdocs are doc1 with score 2.0; doc2 with score 1.5.
search term is term2, topdocs are doc1 with score 20;  doc2 with score 15.
it is hard to adjust the relative score of these 2 docs with add a value.  i
if it is multiply, it's easy. if doc1 is very old, we assign a score
1,and doc2 is new, we assign a score 2
thus total score is 2.0*1 1.5*2 . So doc2 rank higher than doc1
but when use add,  2.0 + weight*1, 1.5 +weight*2, it's hard to get a
proper weight.
if we let weight is 1, it works well for term1
but with term2, it 20 +1*1.5 15+1*2  time has little influence on the
final result.


Re: Nested table support ability

2010-06-22 Thread amit_ak

Hi Otis, Thanks for the update.

My paramteric search has to span across customer table and 30 child tables.
We have close to 1 million customers. Do you think Lucene/Solr is the right
fsolution for such requirements? or database search would be more optimal.

Regards,
Amit

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field missing when use distributed search + dismax

2010-06-22 Thread Scott Zhang
Hi. Lance.

Thanks for replying.

Yes. I especially checked the schema.xml and did another simple test.
The broker is running on localhost:7499/solr.  A solr instance is running on
localhost:7498/solr. For this test, I only use these 2 instances. 7499's
index is empty. 7498 has 12 documents in index. I copied the schema.xml from
7498 to 7499 before test.
1. http://localhost:7498/solr/select
I get:
.
result name="response" numFound="12" start="0">
-

gppost_6179
gppost

.

2. http://localhost:7499/solr/select
I get:


3. http://localhost:7499/solr/select?shards=localhost:7498/solr
I get:

-

gppost_6179

-

gppost_6282


So strange!

I then checked with standard searchhandler.
1. http://localhost:7499/solr/select?shards=localhost:7498/solr&q=marship

-

member_marship11
member
2010-01-21T00:00:00Z



And 2.
http://localhost:7499/solr/select?shards=localhost:7498/solr&q=marship&qt=dismax
result name="response" numFound="1" start="0">
-

member_marship11



So strange!

On Wed, Jun 23, 2010 at 11:12 AM, Lance Norskog  wrote:

> Do all of the Solr instances, including the broker, use the same
> schema.xml?
>
> On 6/22/10, Scott Zhang  wrote:
> > Hi. All.
> >I was using distributed search over 30 solr instance, the previous one
> > was using the standard query handler. And the result was returned
> correctly.
> > each result has 2 fields. "ID" and "type".
> >Today I want to use search withk dismax, I tried search with each
> > instance with dismax. It works correctly, return "ID" and "type" for each
> > result. The strange thing is when I
> > use distributed search, the result only have "ID". The field "type"
> > disappeared. I need that "type" to know what the "ID" refer to. Why solr
> > "eat" my "type"?
> >
> >
> > Thanks.
> > Regards.
> > Scott
> >
>
>
> --
> Lance Norskog
> goks...@gmail.com
>