Re: Fw:how to make fdx file

2012-03-02 Thread C.Yunqin
yes,the fdt file still is there.  can i make new fdx file through fdt file.
 is there a posibilty that  during the process of updating and optimizing, the 
index will be deleted then re-generated?
  
  
  
  -- Original --
  From:  "Erick Erickson";
 Date:  Sat, Mar 3, 2012 08:28 AM
 To:  "solr-user"; 
 
 Subject:  Re: Fw:how to make fdx file

  
As far as I know, fdx files don't just disappear, so I can only assume
that something external removed it.

That said, if you somehow re-indexed and had no fields where
stored="true", then the fdx file may not be there.

Are you seeing problems as a result? This file is used to store
index information for stored fields. Do you have an fdt file?

Best
Erick

On Fri, Mar 2, 2012 at 2:48 AM, C.Yunqin <345804...@qq.com> wrote:
> Hi ,
>   my fdx file was unexpected gone, then the solr sever stop running; what I 
> can do to recover solr?
>
>  Other files still exist.
>
>  Thanks very much
>
>
> 

Re: What is the latest solr version

2012-03-02 Thread Mark Miller
3.5 is the latest release  - when we talk about 4 we are talking about trunk - 
the latest development - its a major release coming that has been in the works 
for a long time now.

On Mar 2, 2012, at 4:18 PM, Mike Austin wrote:

> I've heard some people talk about solr4.. but I only see solr 3.5 available.
> 
> Thanks

- Mark Miller
lucidimagination.com













Re: Help with Synonyms

2012-03-02 Thread Koji Sekiguchi

(12/03/03 1:39), Donald Organ wrote:

I am trying to get synonyms working correctly, I want to map  floor locker
   tostorage locker

currently searching for storage locker produces results were as searching
for floor locker  does not produce any results.
I have the following setup for index time synonyms:


   
 
 
 
 
 
 
 
 
   

And my synonyms.txt looks like this:

floor locker=>storage locker

What am I doing wrong?


Hi Donald,

Try to remove tokenizerFactory="KeywordTokenizerFactory" in your synonym filter
definition because I think you would want to tokenize the synonym settings in
synonyms.txt as "floor" / "locker" => "storage" / "locker". But if you set it
to KeywordTokenizer, it will be a map of "floor locker" => "storage locker", 
and as you
are using WhitespaceTokenizer for your  in , then if you
try to index "floor locker", it will be "floor"/"locker" (not "floor locker"),
as a result, it will not match to your synonym map.

Aside, I recommend that you would set  -  - 
chain in the natural order in , though if those are wrong it won't
be the cause of the problem at all.

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/


Remove underscore char when indexing and query problem

2012-03-02 Thread Floyd Wu
Hi there,

I have a document and its title is "20111213_solr_apache conference report".

When I use analysis web interface to see what tokens exactly solr analyze
and the following is the result

term text20111213_solrapacheconferencereportterm type



Why 20111213_solr tokenized as  and "_" char won't be removed? (I've
add "_" as stop word in stopwords.txt)

I did another test when "20111213_solr_apache conference_report".
As you can see the difference is I add an underscore char between
conference and report. To analyze this string
term text20111213_solrapacheconferencereportterm type

this time the underscore char between conference and report is removed!

Why? How to make solr remove underscore char and behave consistent?
Please help on this.

Thanks in advance.

Floyd


Re: Custom Component which removes documents from response

2012-03-02 Thread Jamie Johnson
I suppose it would help if I populated the list I try to remove things
fromI believe it's working once I did that.  Now that this is out
there, is there a better way to do something like this?

On Fri, Mar 2, 2012 at 10:19 PM, Jamie Johnson  wrote:
> On a previous version of a solr snapshot we had a custom component
> which did the following
>
>
>                boolean fsv =
> req.getParams().getBool(ResponseBuilder.FIELD_SORT_VALUES,false);
>            if(fsv){
>                NamedList sortVals = (NamedList) 
> rsp.getValues().get("sort_values");
>                if(sortVals != null){
>                        Sort sort = 
> searcher.weightSort(rb.getSortSpec().getSort());
>                        SortField[] sortFields = sort==null ? new
> SortField[]{SortField.FIELD_SCORE} : sort.getSort();
>
>                        for (SortField sortField: sortFields) {
>                                String fieldname = sortField.getField();
>                                if(sortVals.get(fieldname) == null)
>                                        continue;
>                                ArrayList list = (ArrayList) 
> sortVals.get(fieldname);
>                                list.removeAll(filteredDocs);
>
>                        }
>                }
>            }
>
> where filteredDocs is an ArrayList of integers containing the
> documents to remove from sort vals.  On the current solr on trunk
> sortVals is now an Object[].  I had tried to update by I had thought
> that doing the following would provide the same result, but alas it
> does not.  Am I missing something that should be happening here?
>
>
>                boolean fsv =
> req.getParams().getBool(ResponseBuilder.FIELD_SORT_VALUES,false);
>            if(fsv){
>                NamedList sortVals = (NamedList) 
> rsp.getValues().get("sort_values");
>                if(sortVals != null){
>                        Sort sort = 
> searcher.weightSort(rb.getSortSpec().getSort());
>                        SortField[] sortFields = sort==null ? new
> SortField[]{SortField.FIELD_SCORE} : sort.getSort();
>
>                        for (SortField sortField: sortFields) {
>                                String fieldname = sortField.getField();
>                                if(sortVals.get(fieldname) == null)
>                                        continue;
>                                Object[] sortValue = 
> (Object[])sortVals.get(fieldname);
>                                List sortValueList = new 
> ArrayList(sortValue.length);
>                                sortValueList.removeAll(filteredDocs);
>
>                                sortVals.remove(fieldname);
>                                sortVals.add(fieldname, 
> sortValueList.toArray());
>
>                        }
>                }
>            }


Custom Component which removes documents from response

2012-03-02 Thread Jamie Johnson
On a previous version of a solr snapshot we had a custom component
which did the following


boolean fsv =
req.getParams().getBool(ResponseBuilder.FIELD_SORT_VALUES,false);
if(fsv){
NamedList sortVals = (NamedList) 
rsp.getValues().get("sort_values");
if(sortVals != null){
Sort sort = 
searcher.weightSort(rb.getSortSpec().getSort());
SortField[] sortFields = sort==null ? new
SortField[]{SortField.FIELD_SCORE} : sort.getSort();

for (SortField sortField: sortFields) {
String fieldname = sortField.getField();
if(sortVals.get(fieldname) == null)
continue;
ArrayList list = (ArrayList) 
sortVals.get(fieldname);
list.removeAll(filteredDocs);

}
}
}

where filteredDocs is an ArrayList of integers containing the
documents to remove from sort vals.  On the current solr on trunk
sortVals is now an Object[].  I had tried to update by I had thought
that doing the following would provide the same result, but alas it
does not.  Am I missing something that should be happening here?


boolean fsv =
req.getParams().getBool(ResponseBuilder.FIELD_SORT_VALUES,false);
if(fsv){
NamedList sortVals = (NamedList) 
rsp.getValues().get("sort_values");
if(sortVals != null){
Sort sort = 
searcher.weightSort(rb.getSortSpec().getSort());
SortField[] sortFields = sort==null ? new
SortField[]{SortField.FIELD_SCORE} : sort.getSort();

for (SortField sortField: sortFields) {
String fieldname = sortField.getField();
if(sortVals.get(fieldname) == null)
continue;
Object[] sortValue = 
(Object[])sortVals.get(fieldname);
List sortValueList = new 
ArrayList(sortValue.length);
sortValueList.removeAll(filteredDocs);

sortVals.remove(fieldname);
sortVals.add(fieldname, 
sortValueList.toArray());

}
}
}


Re: How can Solr do parallel query warming with and ?

2012-03-02 Thread Lance Norskog
The code does everything in single-threaded mode, but is coded to use
a multi-threaded Java ExecutorService. So, I've filed a request:

https://issues.apache.org/jira/browse/SOLR-3197



On Fri, Mar 2, 2012 at 12:40 PM, Neil Hooey  wrote:
>> Someone at Lucid Imagination suggested using multiple > event="firstSearcher"> tags, each with a single facet query in them,
>> but those are still done in parallel.
>
> I meant to say: "but those are still done in sequence".
>
>
> On Fri, Mar 2, 2012 at 3:37 PM, Neil Hooey  wrote:
>> I'm trying to get Solr to run warming queries in parallel with
>> listener events, but it always does them in sequence, pegging one CPU
>> while calculating facet counts.
>>
>> Someone at Lucid Imagination suggested using multiple > event="firstSearcher"> tags, each with a single facet query in them,
>> but those are still done in parallel.
>>
>> Is it possible to run warming queries in parallel, and if so, how?
>>
>> I'm aware that you could run an external script that forks, but I'd
>> like to use Solr's native support for this if it exists.
>>
>> Examples that don't work:
>>
>> 
>> 
>>  
>>    
>>      *:*field1
>>      *:*field2
>>      *:*field3
>>      *:*field4
>>    
>>  
>> 
>>
>> 
>> 
>>  
>>    
>>      *:*field1
>>    
>>  
>>  
>>    
>>      *:*field2
>>    
>>  
>>  
>>    
>>      *:*field3
>>    
>>  
>>  
>>    
>>      *:*field4
>>    
>>  
>> 



-- 
Lance Norskog
goks...@gmail.com


Re: Retrieving multiple levels with hierarchical faceting in Solr

2012-03-02 Thread Erick Erickson
A lot depends on the analysis chain your field is actually using, that is
the tokens that are in the index. Can you supply the schema.xml
file for the field in question?

Best
Erick

On Fri, Mar 2, 2012 at 7:21 AM, adrian.strin...@holidaylettings.co.uk
 wrote:
> I've got a hierarchical facet in my Solr collection; root level values are 
> prefixed with 0;, and the next level is prefixed 1_foovalue;.  I can get the 
> root level easily enough, but when foovalue is selected I need to retrieve 
> the next level in the hierarchy while still displaying all of the options in 
> the root level.  I can't work out how to request either two different 
> prefixes for the facet, or the same facet twice using different prefixes.
>
> I've found a couple of discussions online that suggest I ought to be able to 
> set the prefix using local params:
>
>    facet.field={!prefix=0;}foo
>    facet.field={!prefix=1_foovalue; key=bar}foo
>
> but the prefix seems to be ignored, as the facet returned contains all 
> values.  Should I just  so I can query 
> using f.foo.facet.prefix=0;&f.bar.facet.prefix=1_foovalue;, or is there 
> another way I can request the two different levels of my facet hierarchy at 
> once?
>
> I'm using Solr 3.5.
>
> Thanks,
> Ade
>


Re: A sorting question.

2012-03-02 Thread Erick Erickson
I'm not quite sure what you mean by "order with numeric logic".

You're right, the default ordering is by score. I can't think of anything
that would arbitrarily sort by a varying input string, that is
id:(a OR b OR c OR d) would sort differently than
id:(b OR a OR d Or c).

Perhaps if you outlined the problem you're trying to solve alternate
approaches might be possible...

Best
Erick

On Fri, Mar 2, 2012 at 4:22 AM, Luis Cappa Banda  wrote:
> The only reference I found is:
>
> http://stackoverflow.com/questions/5753079/solr-query-without-order
>
> Anyone had the same problem? Maybe using a dynamic field could solve this
> issue?
>
> Thanks!
>
>
> Luis Cappa.
>
>
> 2012/3/2 Luis Cappa Banda 
>
>> Hello!
>>
>> Just a brief question. I'm querying by my docs ids to retrieve the whole
>> document data from them, and I would like to retrieve them in the same
>> order as I queried. Example:
>>
>> *q*=id:(A+OR+B+OR+C+OR...)
>>
>> And I would like to get a response with a default order like:
>>
>> response:
>>
>>     *docA*:{
>>
>>              }
>>
>>
>>     *docB*:{
>>
>>              }
>>
>>
>>     *docC*:{
>>
>>              }
>>
>>     Etc.
>>
>>
>> The default response get the documents in a different order, I supose that
>> due to Solr internal score algorithm. The ids are not numeric, so there is
>> no option to order them with a numeric logic. Any suggestion?
>>
>> Thanks a lot!
>>
>>
>>
>> Luis Cappa.
>>


Re: Fw:how to make fdx file

2012-03-02 Thread Erick Erickson
As far as I know, fdx files don't just disappear, so I can only assume
that something external removed it.

That said, if you somehow re-indexed and had no fields where
stored="true", then the fdx file may not be there.

Are you seeing problems as a result? This file is used to store
index information for stored fields. Do you have an fdt file?

Best
Erick

On Fri, Mar 2, 2012 at 2:48 AM, C.Yunqin <345804...@qq.com> wrote:
> Hi ,
>   my fdx file was unexpected gone, then the solr sever stop running; what I 
> can do to recover solr?
>
>  Other files still exist.
>
>  Thanks very much
>
>
> 


Including an attribute value from a higher level entity when using DIH to index an XML file

2012-03-02 Thread Mike O'Leary
I have an XML file that I would like to index, that has a structure similar to 
this:


  
[message text]
...
  
  ...


I would like to have the documents in the index correspond to the messages in 
the xml file, and have the user's [id-num] value stored as a field in each of 
the user's documents. I think this means that I have to define an entity for 
message that looks like this:


  
  

  
  
   
  


but I don't know where to put the field definition for the user id. It would 
look like



I can't put it within the message entity, because it is defined with 
forEach="/data/user/message/" and the id field's xpath value is outside of the 
entity's scope. Putting the id field definition there causes a null pointer 
exception. I don't think I want to create a "user" entity that the "message" 
entity is nested inside of, or is there a way to do that and still have the 
index documents correspond to messages from the file? Are there one or more 
attributes or values of attribute that I haven't run across in my searching 
that provide a way to do what I need to do?
Thanks,
Mike




Payloads slowing down add/delete doc

2012-03-02 Thread Gary Yang
Hi, there

In order to keep a DocID vs UID map, we added payload to a solr core. The 
search on UID is very fast but we get a problem with adding/deleting docs.  
Every time we commit an adding/deleting action, solr/lucene will take up to 30 
seconds to complete.  Without payload, a same action can be done in 
milliseconds.

We do need real time commit.

Here is the payload definition:





  


   


Any suggestions?

Any help is appreciated.

Best Regards

G. Y.


Re: Solr Design question on spatial search

2012-03-02 Thread Venu Gmail Dev
Sorry for not being clear enough.

I don't know the point of origin. All I know is that there are 20K retail 
stores. Only the cities within 10 miles radius of these stores should be 
searchable. Any city which is outside these small 10miles circles around these 
20K stores should be ignored.

So when somebody searches for a city, I need to query the cities which are in 
these 20K 10miles circles but I don't know which 10-mile circle I should query.

So the approach that I was thinking were :-

 a) Have 2 separate indexes. First one to store the information about all 
 the cities and second one to store the retail stores information. Whenever 
 user searches for a city then I return all the matching cities ( and hence 
 the lat-long) from first index and then do a spatial search on each of the 
 matched city in the second index. But this is too costly.
 
 b) Index only the cities which have a nearby store. Do all the 
 calculation(s) before indexing the data so that the search is fast. The 
 problem that I see with this approach is that if a new retail store or a 
 city is added then I would have to re-index all the data again.

Does this answers the problem that you posed ?

Thanks,
Venu.

On Mar 2, 2012, at 9:52 PM, Erick Erickson wrote:

> But again, that doesn't answer the problem I posed. Where is your
> point of origin?
> There's nothing in what you've written that indicates how you would know
> that 10 miles is relative to San Francisco. All you've said is that
> you're searching
> on "San". Which would presumably return San Francisco, San Mateo, San Jose.
> 
> Then, also presumably, you're looking for all the cities with stores
> within 10 miles
> of one of these cities. But nothing in your criteria so far says that
> that city is
> San Francisco.
> 
> If you already know that San Francisco is the locus, simple distance
> will work just
> fine. You can index both city and store info in the same index and
> restrict, say, facets
> (or, indeed search results) by fq clause (e.g. fq=type:city or fq=type:store).
> 
> Or I'm completely missing the boat here.
> 
> Best
> Erick
> 
> 
> On Fri, Mar 2, 2012 at 11:50 AM, Venu Dev  wrote:
>> So let's say x=10 miles. Now if I search for San then San Francisco, San 
>> Mateo should be returned because there is a retail store in San Francisco. 
>> But San Jose should not be returned because it is more than 10 miles away 
>> from San
>> Francisco. Had there been a retail store in San Jose then it should be also 
>> returned when you search for San. I can restrict the queries to a country.
>> 
>> Thanks,
>> ~Venu
>> 
>> On Mar 2, 2012, at 5:57 AM, Erick Erickson  wrote:
>> 
>>> I don't see how this works, since your search for San could also return
>>> San Marino, Italy. Would you then return all retail stores in
>>> X miles of that city? What about San Salvador de Jujuy, Argentina?
>>> 
>>> And even in your example, San would match San Mateo. But should
>>> the search then return any stores within X miles of San Mateo?
>>> You have to stop somewhere
>>> 
>>> Is there any other information you have that restricts how far to expand the
>>> search?
>>> 
>>> Best
>>> Erick
>>> 
>>> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev  
>>> wrote:
 I don't think Spatial search will fully fit into this. I have 2 approaches 
 in mind but I am not satisfied with either one of them.
 
 a) Have 2 separate indexes. First one to store the information about all 
 the cities and second one to store the retail stores information. Whenever 
 user searches for a city then I return all the matching cities from first 
 index and then do a spatial search on each of the matched city in the 
 second index. But this is too costly.
 
 b) Index only the cities which have a nearby store. Do all the 
 calculation(s) before indexing the data so that the search is fast. The 
 problem that I see with this approach is that if a new retail store or a 
 city is added then I would have to re-index all the data again.
 
 
 On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
 
> I believe that what you need is spatial search...
> 
> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
> 
> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar 
> wrote:
> 
>> Hello,
>> 
>> I have a design question for Solr.
>> 
>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>> These retail stores are spread across the world.  My search requirement 
>> is
>> to find all the cities which are within x miles of a retail store.
>> 
>> So lets say if we have a retail Store in San Francisco and if I search 
>> for
>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>> returned as they are within x miles from San Francisco. I also want to 
>> rank
>> the search results b

Re: Solr Design question on spatial search

2012-03-02 Thread Erick Erickson
But again, that doesn't answer the problem I posed. Where is your
point of origin?
There's nothing in what you've written that indicates how you would know
that 10 miles is relative to San Francisco. All you've said is that
you're searching
on "San". Which would presumably return San Francisco, San Mateo, San Jose.

Then, also presumably, you're looking for all the cities with stores
within 10 miles
of one of these cities. But nothing in your criteria so far says that
that city is
San Francisco.

If you already know that San Francisco is the locus, simple distance
will work just
fine. You can index both city and store info in the same index and
restrict, say, facets
(or, indeed search results) by fq clause (e.g. fq=type:city or fq=type:store).

Or I'm completely missing the boat here.

Best
Erick


On Fri, Mar 2, 2012 at 11:50 AM, Venu Dev  wrote:
> So let's say x=10 miles. Now if I search for San then San Francisco, San 
> Mateo should be returned because there is a retail store in San Francisco. 
> But San Jose should not be returned because it is more than 10 miles away 
> from San
> Francisco. Had there been a retail store in San Jose then it should be also 
> returned when you search for San. I can restrict the queries to a country.
>
> Thanks,
> ~Venu
>
> On Mar 2, 2012, at 5:57 AM, Erick Erickson  wrote:
>
>> I don't see how this works, since your search for San could also return
>> San Marino, Italy. Would you then return all retail stores in
>> X miles of that city? What about San Salvador de Jujuy, Argentina?
>>
>> And even in your example, San would match San Mateo. But should
>> the search then return any stores within X miles of San Mateo?
>> You have to stop somewhere
>>
>> Is there any other information you have that restricts how far to expand the
>> search?
>>
>> Best
>> Erick
>>
>> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev  
>> wrote:
>>> I don't think Spatial search will fully fit into this. I have 2 approaches 
>>> in mind but I am not satisfied with either one of them.
>>>
>>> a) Have 2 separate indexes. First one to store the information about all 
>>> the cities and second one to store the retail stores information. Whenever 
>>> user searches for a city then I return all the matching cities from first 
>>> index and then do a spatial search on each of the matched city in the 
>>> second index. But this is too costly.
>>>
>>> b) Index only the cities which have a nearby store. Do all the 
>>> calculation(s) before indexing the data so that the search is fast. The 
>>> problem that I see with this approach is that if a new retail store or a 
>>> city is added then I would have to re-index all the data again.
>>>
>>>
>>> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>>>
 I believe that what you need is spatial search...

 Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch

 On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar 
 wrote:

> Hello,
>
> I have a design question for Solr.
>
> I work for an enterprise which has a lot of retail stores (approx. 20K).
> These retail stores are spread across the world.  My search requirement is
> to find all the cities which are within x miles of a retail store.
>
> So lets say if we have a retail Store in San Francisco and if I search for
> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
> returned as they are within x miles from San Francisco. I also want to 
> rank
> the search results by their distance.
>
> I can create an index with all the cities in it but I am not sure how do I
> ensure that the cities returned in a search result have a nearby retail
> store. Any suggestions ?
>
> Thanks,
> Venu,
>



 --
 Dirceu Vieira Júnior
 ---
 +47 9753 2473
 dirceuvjr.blogspot.com
 twitter.com/dirceuvjr
>>>


Re: date queries too slow

2012-03-02 Thread veerene
thanks for responding. we will try the trie fields.
the reason we are not using filters is these date values would change from
query to query.
we are dynamically populating these date values in the queries using the
current time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/date-queries-too-slow-tp3794345p3794677.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with duplicate unique IDs

2012-03-02 Thread alxsss

 take a look to  
I think you must use dedup to solve this issue

 

 

-Original Message-
From: Thomas Dowling 
To: solr-user 
Cc: Mikhail Khludnev 
Sent: Fri, Mar 2, 2012 1:10 pm
Subject: Re: Help with duplicate unique IDs


Thanks.  In fact, the behavior I want is overwrite=true.  I want to be 
able to reindex documents, with the same id string, and automatically 
overwrite the previous version.


Thomas


On 03/02/2012 04:01 PM, Mikhail Khludnev wrote:
> Hello Tomas,
>
> I guess you could just specify overwrite=false
> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22
>
>
> On Fri, Mar 2, 2012 at 11:23 PM, Thomas Dowlingwrote:
>
>> In a Solr index of journal articles, I thought I was safe reindexing
>> articles because their unique ID would cause the new record in the index to
>> overwrite the old one. (As stated at http://wiki.apache.org/solr/**
>> SchemaXml#The_Unique_Key_Field-
>>  
right?)
>>

 


What is the latest solr version

2012-03-02 Thread Mike Austin
I've heard some people talk about solr4.. but I only see solr 3.5 available.

Thanks


Re: Help with duplicate unique IDs

2012-03-02 Thread Thomas Dowling
Thanks.  In fact, the behavior I want is overwrite=true.  I want to be 
able to reindex documents, with the same id string, and automatically 
overwrite the previous version.



Thomas


On 03/02/2012 04:01 PM, Mikhail Khludnev wrote:

Hello Tomas,

I guess you could just specify overwrite=false
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22


On Fri, Mar 2, 2012 at 11:23 PM, Thomas Dowlingwrote:


In a Solr index of journal articles, I thought I was safe reindexing
articles because their unique ID would cause the new record in the index to
overwrite the old one. (As stated at http://wiki.apache.org/solr/**
SchemaXml#The_Unique_Key_Field-
 right?)



Re: Help with duplicate unique IDs

2012-03-02 Thread Mikhail Khludnev
Hello Tomas,

I guess you could just specify overwrite=false
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22


On Fri, Mar 2, 2012 at 11:23 PM, Thomas Dowling wrote:

> In a Solr index of journal articles, I thought I was safe reindexing
> articles because their unique ID would cause the new record in the index to
> overwrite the old one. (As stated at http://wiki.apache.org/solr/**
> SchemaXml#The_Unique_Key_Field-
>  right?)
>
> My schema.xml includes:
>
> ...
>   required="true"/>
> ...
>
> And:
>
> id
>
> And yet I can compose a query with two hits in the index, showing:
>
> #1: 03405443/v66i0003/**347_mrirtaitmbpa
> #2: 03405443/v66i0003/**347_mrirtaitmbpa
>
>
> Can anyone give pointers on where I'm screwing something up?
>
>
> Thomas Dowling
> thomas.dowl...@gmail.com
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: How can Solr do parallel query warming with and ?

2012-03-02 Thread Neil Hooey
> Someone at Lucid Imagination suggested using multiple  event="firstSearcher"> tags, each with a single facet query in them,
> but those are still done in parallel.

I meant to say: "but those are still done in sequence".


On Fri, Mar 2, 2012 at 3:37 PM, Neil Hooey  wrote:
> I'm trying to get Solr to run warming queries in parallel with
> listener events, but it always does them in sequence, pegging one CPU
> while calculating facet counts.
>
> Someone at Lucid Imagination suggested using multiple  event="firstSearcher"> tags, each with a single facet query in them,
> but those are still done in parallel.
>
> Is it possible to run warming queries in parallel, and if so, how?
>
> I'm aware that you could run an external script that forks, but I'd
> like to use Solr's native support for this if it exists.
>
> Examples that don't work:
>
> 
> 
>  
>    
>      *:*field1
>      *:*field2
>      *:*field3
>      *:*field4
>    
>  
> 
>
> 
> 
>  
>    
>      *:*field1
>    
>  
>  
>    
>      *:*field2
>    
>  
>  
>    
>      *:*field3
>    
>  
>  
>    
>      *:*field4
>    
>  
> 


How can Solr do parallel query warming with and ?

2012-03-02 Thread Neil Hooey
I'm trying to get Solr to run warming queries in parallel with
listener events, but it always does them in sequence, pegging one CPU
while calculating facet counts.

Someone at Lucid Imagination suggested using multiple  tags, each with a single facet query in them,
but those are still done in parallel.

Is it possible to run warming queries in parallel, and if so, how?

I'm aware that you could run an external script that forks, but I'd
like to use Solr's native support for this if it exists.

Examples that don't work:



  

  *:*field1
  *:*field2
  *:*field3
  *:*field4

  




  

  *:*field1

  
  

  *:*field2

  
  

  *:*field3

  
  

  *:*field4

  



Re: date queries too slow

2012-03-02 Thread Ahmet Arslan


--- On Fri, 3/2/12, veerene  wrote:

> From: veerene 
> Subject: date queries too slow
> To: solr-user@lucene.apache.org
> Date: Friday, March 2, 2012, 8:29 PM
> Hello,
> we are having significant performance problems with date
> queries on our
> production server. 
> we are using SOLR 1.4 (will be upgrading to latest version
> in the near
> future) and our index size is around 4GB with 2 million
> documents.
> for e.g: the query "tag:obama AND
> expirationdate:[2012-02-21T00:00:00Z TO *]
> AND publicationdate:[* TO 2012-02-21T00:00:00Z]" takes
> 1113ms and if we
> remove the dates "tag:obama" it takes only 98ms.
> 
> we tried storing the date as a long data field in the form
> "YYYMMDDHHMMSS"
> and we are seeing improvement in the query times, but it's
> not significant.
> 
> I have read somewhere is we use "TrieDate" field instead
> "Date" it would
> help.

For range queries, definitely, trie based fields are the way to go. Also you 
can use filter query to re-use the filters.

q=tag:obama&fq=expirationdate:[2012-02-21T00:00:00Z TO *]&fq=publicationdate:[* 
TO 2012-02-21T00:00:00Z]

http://wiki.apache.org/solr/SolrCaching#filterCache


Re: Help with duplicate unique IDs

2012-03-02 Thread Pawel Rog
Once I had the same problem. I didn't know what's going on. After few
moment of analysis I created completely new index and removed old one
(I hadn't enough time to analyze problem). Problem didn't come back
any more.

--
Regards,
Pawel

On Fri, Mar 2, 2012 at 8:23 PM, Thomas Dowling  wrote:
> In a Solr index of journal articles, I thought I was safe reindexing
> articles because their unique ID would cause the new record in the index to
> overwrite the old one. (As stated at
> http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field - right?)
>
> My schema.xml includes:
>
> ...
>   required="true"/>
> ...
>
> And:
>
> id
>
> And yet I can compose a query with two hits in the index, showing:
>
> #1: 03405443/v66i0003/347_mrirtaitmbpa
> #2: 03405443/v66i0003/347_mrirtaitmbpa
>
>
> Can anyone give pointers on where I'm screwing something up?
>
>
> Thomas Dowling
> thomas.dowl...@gmail.com


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-02 Thread Matthew Parker
I've ensured the SOLR data subdirectories and files were completed cleaned
out, but the issue still occurs.

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson wrote:

> Matt:
>
> Just for paranoia's sake, when I was playing around with this (the
> _version_ thing was one of my problems too) I removed the entire data
> directory as well as the zoo_data directory between experiments (and
> recreated just the data dir). This included various index.2012
> files and the tlog directory on the theory that *maybe* there was some
> confusion happening on startup with an already-wonky index.
>
> If you have the energy and tried that it might be helpful information,
> but it may also be a total red-herring
>
> FWIW
> Erick
>
> On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller  wrote:
> >> I assuming the windows configuration looked correct?
> >
> > Yeah, so far I can not spot any smoking gun...I'm confounded at the
> moment. I'll re read through everything once more...
> >
> > - Mark
>

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.


Help with duplicate unique IDs

2012-03-02 Thread Thomas Dowling
In a Solr index of journal articles, I thought I was safe reindexing 
articles because their unique ID would cause the new record in the index 
to overwrite the old one. (As stated at 
http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field - right?)


My schema.xml includes:

...

...

And:

id

And yet I can compose a query with two hits in the index, showing:

#1: 03405443/v66i0003/347_mrirtaitmbpa
#2: 03405443/v66i0003/347_mrirtaitmbpa


Can anyone give pointers on where I'm screwing something up?


Thomas Dowling
thomas.dowl...@gmail.com


date queries too slow

2012-03-02 Thread veerene
Hello,
we are having significant performance problems with date queries on our
production server. 
we are using SOLR 1.4 (will be upgrading to latest version in the near
future) and our index size is around 4GB with 2 million documents.
for e.g: the query "tag:obama AND expirationdate:[2012-02-21T00:00:00Z TO *]
AND publicationdate:[* TO 2012-02-21T00:00:00Z]" takes 1113ms and if we
remove the dates "tag:obama" it takes only 98ms.

we tried storing the date as a long data field in the form "YYYMMDDHHMMSS"
and we are seeing improvement in the query times, but it's not significant.

I have read somewhere is we use "TrieDate" field instead "Date" it would
help.
Do you guys think the performance improvement would be significant? if not,
do you see if there is any other alternative solution to speed up the date
queries?
I appreciate your response.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/date-queries-too-slow-tp3794345p3794345.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread Ahmet Arslan
> Ahmet, this is a good find. Can we still open a JIRA issue
> so that a
> more useful exception is thrown here?

Robert, I created SOLR-3193 and created a test using Andrew's files.


Re: Solr Design question on spatial search

2012-03-02 Thread Venu Dev
So let's say x=10 miles. Now if I search for San then San Francisco, San Mateo 
should be returned because there is a retail store in San Francisco. But San 
Jose should not be returned because it is more than 10 miles away from San 
Francisco. Had there been a retail store in San Jose then it should be also 
returned when you search for San. I can restrict the queries to a country. 

Thanks,
~Venu

On Mar 2, 2012, at 5:57 AM, Erick Erickson  wrote:

> I don't see how this works, since your search for San could also return
> San Marino, Italy. Would you then return all retail stores in
> X miles of that city? What about San Salvador de Jujuy, Argentina?
> 
> And even in your example, San would match San Mateo. But should
> the search then return any stores within X miles of San Mateo?
> You have to stop somewhere
> 
> Is there any other information you have that restricts how far to expand the
> search?
> 
> Best
> Erick
> 
> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev  
> wrote:
>> I don't think Spatial search will fully fit into this. I have 2 approaches 
>> in mind but I am not satisfied with either one of them.
>> 
>> a) Have 2 separate indexes. First one to store the information about all the 
>> cities and second one to store the retail stores information. Whenever user 
>> searches for a city then I return all the matching cities from first index 
>> and then do a spatial search on each of the matched city in the second 
>> index. But this is too costly.
>> 
>> b) Index only the cities which have a nearby store. Do all the 
>> calculation(s) before indexing the data so that the search is fast. The 
>> problem that I see with this approach is that if a new retail store or a 
>> city is added then I would have to re-index all the data again.
>> 
>> 
>> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>> 
>>> I believe that what you need is spatial search...
>>> 
>>> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
>>> 
>>> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar 
>>> wrote:
>>> 
 Hello,
 
 I have a design question for Solr.
 
 I work for an enterprise which has a lot of retail stores (approx. 20K).
 These retail stores are spread across the world.  My search requirement is
 to find all the cities which are within x miles of a retail store.
 
 So lets say if we have a retail Store in San Francisco and if I search for
 "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
 returned as they are within x miles from San Francisco. I also want to rank
 the search results by their distance.
 
 I can create an index with all the cities in it but I am not sure how do I
 ensure that the cities returned in a search result have a nearby retail
 store. Any suggestions ?
 
 Thanks,
 Venu,
 
>>> 
>>> 
>>> 
>>> --
>>> Dirceu Vieira Júnior
>>> ---
>>> +47 9753 2473
>>> dirceuvjr.blogspot.com
>>> twitter.com/dirceuvjr
>> 


Help with Synonyms

2012-03-02 Thread Donald Organ
I am trying to get synonyms working correctly, I want to map  floor locker
  tostorage locker

currently searching for storage locker produces results were as searching
for floor locker  does not produce any results.
I have the following setup for index time synonyms:


  








  
..



And my synonyms.txt looks like this:

floor locker=>storage locker



What am I doing wrong?


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread Ahmet Arslan
> But - the wiki page has a foot note that says "a tokenizer
> must be defined
> for the field, but it doesn't need to be indexed". The body
> field has the
> type "dcx_text" which has a tokenizer.
> 
> Is the documentation wrong here or am I misunderstanding
> something? 

Ah, I never read that note. (just looking on the table).

I think you are right, I can generate snippet from the following field:






Two Questions - Tomcat/Jetty & Java/PHP

2012-03-02 Thread Spadez
Hi,

I have two newbie questions. With all my searching I havent been able to
find which would be a better choice to run my SOLR / Nutch install, Tomcat
or Jetty. There seems to be a lot of people on the internet saying Jetty has
better performance but I havent been able to see any proof of that.

Secondly, for my actual website script, is it better to write it in PHP or
Java. In my eyes, Java might be a good choice because then I can run it on
Tomcat without needing apache, but PHP might be a good option because it
seems faster, and if my Java serverlet goes down the site will still work.

Can anyone give me some input please?

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Two-Questions-Tomcat-Jetty-Java-PHP-tp3793845p3793845.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread andrew
Ah, ok - thank you for looking at it.

But - the wiki page has a foot note that says "a tokenizer must be defined
for the field, but it doesn't need to be indexed". The body field has the
type "dcx_text" which has a tokenizer.

Is the documentation wrong here or am I misunderstanding something? 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread Robert Muir
On Fri, Mar 2, 2012 at 9:41 AM, Ahmet Arslan  wrote:
>
>> Robert, I just tried with
>> 3.6-SNAPSHOT 1296203 from svn - the problem is
>> still there.
>>
>> I am just about to leave for a vacation. I'll try to open a
>> JIRA issue this
>> evening.
>
> Andrew, thanks for providing files. I also re-produced it.
>
> But cause of the exception is that you are trying to highlight on a field 
> (body) that is not indexed.
>
> To enable highlighting you need both indexed="true" and stored="true" .
> http://wiki.apache.org/solr/FieldOptionsByUseCase
>
> I changed definition of body field from indexed="false" to indexed="true" and 
> it is working now.
>
> But for the record (with indexed="false"), it is weird that it produces 
> snippet in the first request, and then fails in the second request.
>
>

Ahmet, this is a good find. Can we still open a JIRA issue so that a
more useful exception is thrown here?


-- 
lucidimagination.com


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread Ahmet Arslan

> Robert, I just tried with
> 3.6-SNAPSHOT 1296203 from svn - the problem is
> still there.
> 
> I am just about to leave for a vacation. I'll try to open a
> JIRA issue this
> evening.

Andrew, thanks for providing files. I also re-produced it. 

But cause of the exception is that you are trying to highlight on a field 
(body) that is not indexed. 

To enable highlighting you need both indexed="true" and stored="true" .
http://wiki.apache.org/solr/FieldOptionsByUseCase

I changed definition of body field from indexed="false" to indexed="true" and 
it is working now.

But for the record (with indexed="false"), it is weird that it produces snippet 
in the first request, and then fails in the second request. 




does the location of a match (within a field) affect the score?

2012-03-02 Thread geeky2
hello all,

example:

i have a field named itemNo

the user does a search, itemNo:665

there are three document in the core, that look like this

doc1 - itemNo = 1237899*665*

doc2 - itemNo = *665*1237899

doc3 - itemNo = 123*665*7899



does the location or placement of the search string (beginning, middle, end)
affect the scoring of the document?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-the-location-of-a-match-within-a-field-affect-the-score-tp3793634p3793634.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread andrew
Robert, I just tried with 3.6-SNAPSHOT 1296203 from svn - the problem is
still there.

I am just about to leave for a vacation. I'll try to open a JIRA issue this
evening.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793593.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Making additional solr requests in an QueryResponseWriter

2012-03-02 Thread Donnie McNeal
Mikhail,

Thanks for the reply.  Regarding your comments:

1 - OK. That's good to know.

2 - I thought about adding the subcategories to the category after I sent
my original question. This could work, but there are times when we need the
subcategories returned within the parent document and times where we don't
need them.

I'll try another example to see if it helps explain our problem a little
better.

Suppose you have a category heirachy like this:
Music
  +- Record Label
+- Artist 1
  +- Album A
  +- Album B
+- Artist 2

There are times where we would want to query and retrieve a  record label
and its the artist (subcategory), but we don't want want any of the album
information from the artist. We also need to be able to just query on all
categories directly.

3 - We looked at the joins and you are right that is what we need, but we
won't be able to use solr 4.0 for a while.

Thanks,

Donnie
On Mar 1, 2012 10:22 PM, "Mikhail Khludnev" 
wrote:

> Hello Donnie,
>
> 1. Nothing beside of design consideration prevents you form doing search in
> QueryResponseWriter. You have a request, which isn't closed yet, where you
> can obtain searcher from.
> 2. Your usecase isn't clear. If you need just to search categories, and
> return the lists of subcategories per every category found, you can just
> put your subcats list into huge stored field.
> 3. Otherwise, it sounds like http://wiki.apache.org/solr/Join or like
> https://issues.apache.org/jira/browse/SOLR-3076 , which is in really early
> stage, though.
>
> Regards
>
> On Fri, Mar 2, 2012 at 2:40 AM, Donnie McNeal  >wrote:
>
> > Hi all,
> >
> > The documents in our solr index have an parent child relationship which
> we
> > have basically flattened in our solr queries. We have messaged solr into
> > being the query API for a 3rd party data.  The relationship is simple
> > parent-child relationship as follows:
> >
> > category
> > +-sub-category
> >
> > this ultimately maps to something like this in our Model:
> >
> > class Category
> >
> > List getChildCategories();
> >
> > Currently, in our application code we were thinking about issuing 2
> queries
> > one to get the parent category, and one to get the sub-categories of the
> > parent.  We would then assemble the results in our model.
> >
> > What I was wondering is would it be feasible (or even an ok practice) to
> > create a QueryResponseWriter (more than likely subclass an existing one
> > like XMLResponseFormat) that when requested it issue an additional call
> to
> > fetch the sub categories and add them to the original category document.
> >  Please be gentle with me :).
> >
> > Maybe we just need to create the index a little bit differently to better
> > handle this relationship.
> >
> > Thanks,
> >
> > Donnie
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>
> 
>  
>


Re: Architectural question structuring solr, multiple instances or filters

2012-03-02 Thread Erick Erickson
A lot depends on the size here. If each user has a zillion records,
consider multiple
indexes. But by and large, if they all fit in a single index the maintenance is
simpler if you just have a single index (core). And a single core also makes
somewhat more efficient use of memory etc.

Best
Erick

On Fri, Mar 2, 2012 at 2:25 AM, Ramo Karahasan
 wrote:
> Hi
>
>
>
> I face the issue that i have n business-user. Each business-user  has it's
> own amount products. I want to provide an interface for each business-user
> where he can find only the products he offers. What would be a be a better
> solution:
>
> 1.)    To have one big index and filter by customer-name?
>
> 2.)    Have multiple solr instances for each business-user
>
> 3.)    Another possiblities?
>
>
>
> I just run one solr instance currently for default none-business-user that
> can see all products and face the issue now, how to separate the products
> for business-user search-wise.
>
>
>
> Any ideas would be appreciated.
>
>
>
> Thanks,
>
> Ramo
>


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-02 Thread Erick Erickson
Matt:

Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire data
directory as well as the zoo_data directory between experiments (and
recreated just the data dir). This included various index.2012
files and the tlog directory on the theory that *maybe* there was some
confusion happening on startup with an already-wonky index.

If you have the energy and tried that it might be helpful information,
but it may also be a total red-herring

FWIW
Erick

On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller  wrote:
>> I assuming the windows configuration looked correct?
>
> Yeah, so far I can not spot any smoking gun...I'm confounded at the moment. 
> I'll re read through everything once more...
>
> - Mark


Re: Solr Design question on spatial search

2012-03-02 Thread Erick Erickson
I don't see how this works, since your search for San could also return
San Marino, Italy. Would you then return all retail stores in
X miles of that city? What about San Salvador de Jujuy, Argentina?

And even in your example, San would match San Mateo. But should
the search then return any stores within X miles of San Mateo?
You have to stop somewhere

Is there any other information you have that restricts how far to expand the
search?

Best
Erick

On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev  wrote:
> I don't think Spatial search will fully fit into this. I have 2 approaches in 
> mind but I am not satisfied with either one of them.
>
> a) Have 2 separate indexes. First one to store the information about all the 
> cities and second one to store the retail stores information. Whenever user 
> searches for a city then I return all the matching cities from first index 
> and then do a spatial search on each of the matched city in the second index. 
> But this is too costly.
>
> b) Index only the cities which have a nearby store. Do all the calculation(s) 
> before indexing the data so that the search is fast. The problem that I see 
> with this approach is that if a new retail store or a city is added then I 
> would have to re-index all the data again.
>
>
> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>
>> I believe that what you need is spatial search...
>>
>> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
>>
>> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar 
>> wrote:
>>
>>> Hello,
>>>
>>> I have a design question for Solr.
>>>
>>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>>> These retail stores are spread across the world.  My search requirement is
>>> to find all the cities which are within x miles of a retail store.
>>>
>>> So lets say if we have a retail Store in San Francisco and if I search for
>>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>>> returned as they are within x miles from San Francisco. I also want to rank
>>> the search results by their distance.
>>>
>>> I can create an index with all the cities in it but I am not sure how do I
>>> ensure that the cities returned in a search result have a nearby retail
>>> store. Any suggestions ?
>>>
>>> Thanks,
>>> Venu,
>>>
>>
>>
>>
>> --
>> Dirceu Vieira Júnior
>> ---
>> +47 9753 2473
>> dirceuvjr.blogspot.com
>> twitter.com/dirceuvjr
>


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread andrew
I posted the files here: http://www.mediafire.com/?z43a5qyfvz4zxp1


--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793496.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr web admin in xml format

2012-03-02 Thread Ricardo F

Interesting, with curl I get the content in xml format.
Thanks!

> CC: solr-user@lucene.apache.org
> From: erik.hatc...@gmail.com
> Subject: Re: Solr web admin in xml format
> Date: Fri, 2 Mar 2012 07:59:16 -0500
> To: solr-user@lucene.apache.org
>
> Not at my computer at the moment but there are request handlers that can give 
> you those details as as well as JMX.
>
> But the stats page IS XML :). View source. :)
>
> On Mar 2, 2012, at 7:50, Ricardo F  wrote:
>
> >
> > Get values from the statistics web, but in xml format for parse it with a 
> > perl script.
> > Thanks
> >
> > 
> >> Date: Fri, 2 Mar 2012 12:51:00 +0100
> >> From: matheis.ste...@googlemail.com
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr web admin in xml format
> >>
> >> Ricardo, What exactly do you need?
> >>
> >> On Friday, March 2, 2012 at 12:05 PM, Ricardo F wrote:
> >>
> >>>
> >>> Hello,
> >>> How can I get the output of the web interface in xml format? I need it 
> >>> for munin monitoring.
> >>>
> >>> Thanks
> >>
> >>
> >
  

Re: Building a resilient cluster

2012-03-02 Thread Erick Erickson
One other fault-tolerance issue is that you'll need at least one replica
per shard. As I understand it, at least *one* machine has to be running
for each shard for the cluster to work.

This doesn't address the shardId issue, but is something to keep in
mind when testing.

Best
Erick

On Wed, Feb 29, 2012 at 2:30 AM, Ranjan Bagchi  wrote:
> Hi,
>
> I'm interested in setting up a solr cluster where each machine [at least
> initially] hosts a separate shard of a big index [too big to sit on the
> machine].  I'm able to put a cloud together by telling it that I have (to
> start out with) 4 nodes, and then starting up nodes on 3 machines pointing
> at the zkInstance.  I'm able to load my sharded data onto each machine
> individually and it seems to work.
>
> My concern is that it's not fault tolerant:  if one of the non-zookeeper
> machines falls over, the whole cluster won't work.  Also, I can't create a
> shard with more data, and have it work within the existing cloud.
>
> I tried using -DshardId=shard5 [on an existing 4-shard cluster], but it
> just started replicating, which doesn't seem right.
>
> Are there ways around this?
>
> Thanks,
> Ranjan Bagchi


Re: Is there a way to implement a IntRangeField in Solr?

2012-03-02 Thread Erick Erickson
First, I really don't understand why you would have OOMs when
indexing even a humongous number of dates, that just seems weird.

But what happens if you think about it the other way? Instead of indexing
open dates, index booked dates. Then construct filter queries like
fq=-booked:[5 TO 23], where the range is the proposed reservation
date. You'll have to do something creative with the
numbers, possibly the number of days since the epoch so you can
cross years etc..

You can also prune the booked dates in the past to keep the docs
smaller...

Best
Erick

On Wed, Feb 29, 2012 at 2:23 PM, Mikhail Khludnev
 wrote:
> AFAIK join is done in the single core. Same core should have two types of
> documents.
> Pls let me know about your achievement.
>
> On Wed, Feb 29, 2012 at 8:46 PM, federico.wachs
> wrote:
>
>> I'll give this a try. I'm not sure I completely understand how to do that
>> because I don't have so much experience with Solr. Do I have to use another
>> core to post a different kind of document and then join it?
>>
>> Thanks!
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3787873.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>
> 
>  


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread Robert Muir
On Fri, Mar 2, 2012 at 7:37 AM, andrew  wrote:
> I was able to create a test case.
>
> We are querying ranges of documents. When I tried to isolate the document
> that causes trouble, I found it happens with exactly every second request
> only for a single document query (it fails constantly when requesting a
> range of documents where that document is included). I could also reproduce
> the exception with only that single document in the index.
>
> I think it is not a good idea to post the Solr  XML here - it is very
> long (text extract of a newspaper page) and may not reproduce verbatim
> (whitespace etc.) if I paste it here.
>
> iorixxx, koji - is it ok if I send the necessary artifacts (add XML, schema,
> config) via email?
>

You can also open a jira issue
(https://issues.apache.org/jira/browse/SOLR), and upload everything as
attachments.

I would also be very interested if you can test a nightly 3.6 build
(https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/)

There have been *numerous* offsets bugs fixed in 3.6 in a variety of
tokenizers/tokenfilters besides the HTMLStripCharFilter:
https://issues.apache.org/jira/browse/LUCENE-3642
https://issues.apache.org/jira/browse/SOLR-2891
https://issues.apache.org/jira/browse/LUCENE-3717

-- 
lucidimagination.com


RE: Solr web admin in xml format

2012-03-02 Thread Ahmet Arslan

> Get values from the statistics web, but in xml format for
> parse it with a perl script.

Actually http://localhost:8080/solr/coreName/admin/stats.jsp is a XML already. 
It is transformed with stats.xsl to generate web page.

You can use http://wiki.apache.org/solr/SolrJmx to retrieve stats too.

Also you might find this interesting http://sematext.com/spm/index.html



Re: Solr web admin in xml format

2012-03-02 Thread Erik Hatcher
Not at my computer at the moment but there are request handlers that can give 
you those details as as well as JMX.  

But the stats page IS XML :). View source. :)

On Mar 2, 2012, at 7:50, Ricardo F  wrote:

> 
> Get values from the statistics web, but in xml format for parse it with a 
> perl script.
> Thanks
> 
> 
>> Date: Fri, 2 Mar 2012 12:51:00 +0100
>> From: matheis.ste...@googlemail.com
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr web admin in xml format
>> 
>> Ricardo, What exactly do you need?
>> 
>> On Friday, March 2, 2012 at 12:05 PM, Ricardo F wrote:
>> 
>>> 
>>> Hello,
>>> How can I get the output of the web interface in xml format? I need it for 
>>> munin monitoring.
>>> 
>>> Thanks
>> 
>> 
> 


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread Ahmet Arslan
> I think it is not a good idea to post the Solr 
> XML here - it is very
> long (text extract of a newspaper page) and may not
> reproduce verbatim
> (whitespace etc.) if I paste it here. 
> 
> iorixxx, koji - is it ok if I send the necessary artifacts
> (add XML, schema,
> config) via email?

I saw people using http://pastebin.com/ for this purposes before. Can you 
provide your full search URL too?


RE: Solr web admin in xml format

2012-03-02 Thread Ricardo F

Get values from the statistics web, but in xml format for parse it with a perl 
script.
Thanks


> Date: Fri, 2 Mar 2012 12:51:00 +0100
> From: matheis.ste...@googlemail.com
> To: solr-user@lucene.apache.org
> Subject: Re: Solr web admin in xml format
>
> Ricardo, What exactly do you need?
>
> On Friday, March 2, 2012 at 12:05 PM, Ricardo F wrote:
>
> >
> > Hello,
> > How can I get the output of the web interface in xml format? I need it for 
> > munin monitoring.
> >
> > Thanks
>
>
  

Re: Dismax weird behaior wrt defType

2012-03-02 Thread Ahmet Arslan

> Query 1: 
> http://localhost:8085/solr/select/?q=abc&version=2.2&start=0&rows=10&indent=on&defType=dismax
> [defType with capital T -- does not fetch results]
> 
> Query 2: 
> http://localhost:8085/solr/select/?q=abc&version=2.2&start=0&rows=10&indent=on&deftype=dismax
> [defType with small T -- perfect, results returned]
> 

Your second query example uses lucene query parser (which is default). You can 
confirm this via adding &debugQuery=on. It searches on default search field 
(defined in schema.xml)

&deftype=dismax does not change query parser at all.

Your first query example uses dismax, searches on qf (query fields). Your 
search URL does not contain qf, so I assume it is defined in defaults section? 
Try adding field(s) that you want to search on.

solr/select/?q=abc&start=0&rows=10defType=dismax&qf=myField


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread andrew
I was able to create a test case.

We are querying ranges of documents. When I tried to isolate the document
that causes trouble, I found it happens with exactly every second request
only for a single document query (it fails constantly when requesting a
range of documents where that document is included). I could also reproduce
the exception with only that single document in the index.

I think it is not a good idea to post the Solr  XML here - it is very
long (text extract of a newspaper page) and may not reproduce verbatim
(whitespace etc.) if I paste it here. 

iorixxx, koji - is it ok if I send the necessary artifacts (add XML, schema,
config) via email?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793347.html
Sent from the Solr - User mailing list archive at Nabble.com.


Retrieving multiple levels with hierarchical faceting in Solr

2012-03-02 Thread adrian.strin...@holidaylettings.co.uk
I've got a hierarchical facet in my Solr collection; root level values are 
prefixed with 0;, and the next level is prefixed 1_foovalue;.  I can get the 
root level easily enough, but when foovalue is selected I need to retrieve the 
next level in the hierarchy while still displaying all of the options in the 
root level.  I can't work out how to request either two different prefixes for 
the facet, or the same facet twice using different prefixes.

I've found a couple of discussions online that suggest I ought to be able to 
set the prefix using local params:

facet.field={!prefix=0;}foo
facet.field={!prefix=1_foovalue; key=bar}foo

but the prefix seems to be ignored, as the facet returned contains all values.  
Should I just  so I can query using 
f.foo.facet.prefix=0;&f.bar.facet.prefix=1_foovalue;, or is there another way I 
can request the two different levels of my facet hierarchy at once?

I'm using Solr 3.5.

Thanks,
Ade



Re: Solr web admin in xml format

2012-03-02 Thread Stefan Matheis
Ricardo, What exactly do you need?

On Friday, March 2, 2012 at 12:05 PM, Ricardo F wrote:

> 
> Hello,
> How can I get the output of the web interface in xml format? I need it for 
> munin monitoring.
> 
> Thanks 




Solr web admin in xml format

2012-03-02 Thread Ricardo F

Hello,
   How can I get the output of the web interface in xml format?   I need it for 
munin monitoring.

Thanks

Re: Search by url starting with

2012-03-02 Thread Ahmet Arslan
> I am crawling my site using Nutch and posting it to
> Solr.  I am trying to
> implement a feature where I want to get all data where url
> starts with
> "http://someurl/";

What is your field type for url? If its string type, then you can use this:

&q={!prefix f=url}http://someurl/

http://lucene.apache.org/solr/api/org/apache/solr/search/PrefixQParserPlugin.html



Dismax weird behaior wrt defType

2012-03-02 Thread Husain, Yavar
A weird behavior with respect to "defType". Any clues will be appreciated.

Query 1: 
http://localhost:8085/solr/select/?q=abc&version=2.2&start=0&rows=10&indent=on&defType=dismax
 [defType with capital T -- does not fetch results]

Query 2: 
http://localhost:8085/solr/select/?q=abc&version=2.2&start=0&rows=10&indent=on&deftype=dismax
 [defType with small T -- perfect, results returned]

In the above queries I have removed the boosting part because of which I am 
using dismax. And also while placing boosting stuff it is the other way round, 
deftype returns with capital T and does not return with small T.


**This
 message may contain confidential or proprietary information intended only for 
the use of theaddressee(s) named above or may contain information that is 
legally privileged. If you arenot the intended addressee, or the person 
responsible for delivering it to the intended addressee,you are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyprohibited. If you have received this message by mistake, please 
immediately notify us byreplying to the message and delete the original 
message and any copies immediately thereafter.

Thank you.~
**
FAFLD



Re: Architectural question structuring solr, multiple instances or filters

2012-03-02 Thread Mikhail Khludnev
1.)To have one big index and filter by customer-name

On Fri, Mar 2, 2012 at 11:25 AM, Ramo Karahasan <
ramo.karaha...@googlemail.com> wrote:

> 1.)To have one big index and filter by customer-name




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics





Re: A sorting question.

2012-03-02 Thread Luis Cappa Banda
The only reference I found is:

http://stackoverflow.com/questions/5753079/solr-query-without-order

Anyone had the same problem? Maybe using a dynamic field could solve this
issue?

Thanks!


Luis Cappa.


2012/3/2 Luis Cappa Banda 

> Hello!
>
> Just a brief question. I'm querying by my docs ids to retrieve the whole
> document data from them, and I would like to retrieve them in the same
> order as I queried. Example:
>
> *q*=id:(A+OR+B+OR+C+OR...)
>
> And I would like to get a response with a default order like:
>
> response:
>
> *docA*:{
>
>  }
>
>
> *docB*:{
>
>  }
>
>
> *docC*:{
>
>  }
>
> Etc.
>
>
> The default response get the documents in a different order, I supose that
> due to Solr internal score algorithm. The ids are not numeric, so there is
> no option to order them with a numeric logic. Any suggestion?
>
> Thanks a lot!
>
>
>
> Luis Cappa.
>


A sorting question.

2012-03-02 Thread Luis Cappa Banda
Hello!

Just a brief question. I'm querying by my docs ids to retrieve the whole
document data from them, and I would like to retrieve them in the same
order as I queried. Example:

*q*=id:(A+OR+B+OR+C+OR...)

And I would like to get a response with a default order like:

response:

*docA*:{

 }


*docB*:{

 }


*docC*:{

 }

Etc.


The default response get the documents in a different order, I supose that
due to Solr internal score algorithm. The ids are not numeric, so there is
no option to order them with a numeric logic. Any suggestion?

Thanks a lot!



Luis Cappa.


Re: Too many values for UnInvertedField faceting on field topic

2012-03-02 Thread Michael Jakl
Hi!

On Thu, Mar 1, 2012 at 23:54, Yonik Seeley  wrote:
> On Thu, Mar 1, 2012 at 3:34 AM, Michael Jakl  wrote:
>> The topic field holds roughly 5
>> values per doc, but I wasn't able to compute the correct number right
>> now.
>
> How many unique values for that field in the whole index?
> If you have log output (or output from the stats page for
> fieldValueCache) that should tell you exactly.

I'm sorry, I've already reduced the size of the index and I'm in the
process of splitting it into a few shards. Solr couldn't build the
fieldValueCache for this particular field (that's where the exception
came from).

Thanks,
Michael


Re: alphanumeric buckets

2012-03-02 Thread AlexR
oh no sorry.

i need more than one. it was only an example.

0 - 9
A - F
G - I
M -R
S -Z

it will be so easy when i would get all person in this interval with
fq=person:[A TO F]. not only matches of entries

thx
alex

--
View this message in context: 
http://lucene.472066.n3.nabble.com/alphanumeric-buckets-tp3790990p3792803.html
Sent from the Solr - User mailing list archive at Nabble.com.