Re: Two unrelated questions

2011-09-21 Thread tamanjit.bin...@yahoo.co.in
For *1* I have faced similar issues, and have realized that it has got more
to do with the data I am trying to index. In some cases when I run even a
full-import with DIH, unless its a flat table that I am tryin to index,
there are often issues at data end when I try to get joins and then index
data.

Am not too sure if you are joining two tables. If not I would suggest that
you re-check your data and then re-index using full-import.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Two-unrelated-questions-tp3348991p3357720.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: NRT and commit behavior

2011-09-21 Thread Tirthankar Chatterjee
Okay, but is there any number that if we reach on the index size or total docs 
in the index or the size of physical memory that sharding should be considered. 

I am trying to find the winning combination.
Tirthankar
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, September 16, 2011 7:46 AM
To: solr-user@lucene.apache.org
Subject: Re: NRT and commit behavior

Uhm, you're putting  a lot of index into not very much memory. I really think 
you're going to have to shard your index across several machines to get past 
this problem. Simply increasing the size of your caches is still limited by the 
physical memory you're working with.

You really have to put a profiler on the system to see what's going on. At that 
size there are too many things that it *could* be to definitively answer it 
with e-mails

Best
Erick

On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee 
 wrote:
> Erick,
> Also, we had  our solrconfig where we have tried increasing the cache 
> making the below value for autowarm count as 0 helps returning the commit 
> call within the second, but that will slow us down on searches
>
>       class="solr.FastLRUCache"
>      size="16384"
>      initialSize="4096"
>      autowarmCount="4096"/>
>
>    
>
>   
>          class="solr.LRUCache"
>      size="16384"
>      initialSize="4096"
>      autowarmCount="4096"/>
>
>  
>          class="solr.LRUCache"
>      size="512"
>      initialSize="512"
>      autowarmCount="512"/>
>
> -Original Message-
> From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com]
> Sent: Wednesday, September 14, 2011 7:31 AM
> To: solr-user@lucene.apache.org
> Subject: RE: NRT and commit behavior
>
> Erick,
> Here is the answer to your questions:
> Our index is 267 GB
> We are not optimizing...
> No we have not profiled yet to check the bottleneck, but logs indicate 
> opening the searchers is taking time...
> Nothing except SOLR
> Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and 
> JVM and Tomcat
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, September 11, 2011 11:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: NRT and commit behavior
>
> Hmm, OK. You might want to look at the non-cached filter query stuff, it's 
> quite recent.
> The point here is that it is a filter that is applied only after all of the 
> less expensive filter queries are run, One of its uses is exactly ACL 
> calculations. Rather than calculate the ACL for the entire doc set, it only 
> calculates access for docs that have made it past all the other elements of 
> the query See SOLR-2429 and note that it is a 3.4 (currently being 
> released) only.
>
> As to why your commits are taking so long, I have no idea given that you 
> really haven't given us much to work with.
>
> How big is your index? Are you optimizing? Have you profiled the application 
> to see what the bottleneck is (I/O, CPU, etc?). What else is running on your 
> machine? It's quite surprising that it takes that long. How much memory are 
> you giving the JVM? etc...
>
> You might want to review: 
> http://wiki.apache.org/solr/UsingMailingLists
>
> Best
> Erick
>
>
> On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee 
>  wrote:
>> Erick,
>> What you said is correct for us the searches are based on some Active 
>> Directory permissions which are populated in Filter query parameter. So we 
>> don't have any warming query concept as we cannot fire for every user ahead 
>> of time.
>>
>> What we do here is that when user logs in we do an invalid query(which 
>> return no results instead of '*') with the correct filter query (which is 
>> his permissions based on the login). This way the cache gets warmed up with 
>> valid docs.
>>
>> It works then.
>>
>>
>> Also, can you please let me know why commit is taking 45 mins to 1 hours on 
>> a good resourced hardware with multiple processors and 16gb RAM 64 bit VM, 
>> etc. We tried passing waitSearcher as false and found that inside the code 
>> it hard coded to be true. Is there any specific reason. Can we change that 
>> value to honor what is being passed.
>>
>> Thanks,
>> Tirthankar
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Thursday, September 01, 2011 8:38 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: NRT and commit behavior
>>
>> Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very 
>> safe, but I suppose it *might* be OK.
>>
>> What does "invalid" mean? Syntax error? not safe.
>>
>> search that returns 0 results? I don't know, but I'd guess that 
>> filling your caches, which is the point of warming queries, might be 
>> short circuited if the query returns
>> 0 results but I don't know for sure.
>>
>> But the fact that "invalid queries return quicker" does not inspire 
>> confidence since the *point* of warming queries is to spend the 

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Michael Sokolov
I wonder if config-file validation would be helpful here :) I posted a 
patch in SOLR-1758 once.


-Mike

On 9/21/2011 6:22 PM, Michael Ryan wrote:

I think the problem is that the  config needs to be inside of the
  config, rather than after it as your have.

-Michael




Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Jason Toy
I am running the sun version:
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

I get multiple Out of memory exceptions looking at my application and the
solr logs, but my script doesn't get called the first time or other times,
hence why I was thinking that maybe solr is doing something different.  My
script notifies me  of the memory exception and then restarts the jvm.
 Running the script manually works fine. I'll try to do some more testing to
see what exactly is going on.

Jason

On Wed, Sep 21, 2011 at 2:31 PM, Chris Hostetter
wrote:

>
> : Usually any good piece of java code refrains from capturing Throwable
> : so that Errors will bubble up unlike exceptions. Having said that,
>
> Even if some piece of code catches an OutOfMemoryError, the JVM should
> have already called the "-XX:OnOutOfMemoryError" hook - Although from what
> i can tell, the JVM will only call the hook on hte *first* OOM thrown
>
> (you can try the code below to test this behavior in your own JVM)
>
> : > I'm trying to be notified when the error occurs.  I saw with the jvm I
> can
> : > pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every
> time
> : > the out of memory issue occurs though my script never runs. Does solr
> let
>
> ...exactly what JVM are you running?  this option is specific to the
> Sun/Oracle JVM.  For example, in the IBM JVM, there is a completley
> different mechanism...
>
>
> http://stackoverflow.com/questions/3467219/is-there-something-like-xxonerror-or-xxonoutofmemoryerror-in-ibm-jvm
>
>
> -- Simple OnOutOfMemoryError hook test -
> import static java.lang.System.out;
> import java.util.ArrayList;
> public final class Test {
>  public static void main(String... args) throws Exception {
>ArrayList data = new ArrayList(1000);
>for (int i=0; i<5; i++) {
>  try {
>while (i < 5) {
>  data.add(new ArrayList(10));
>}
>  } catch (OutOfMemoryError oom) {
>data.clear();
>out.println("caught");
>  }
>}
>  }
> }
> -- example of running it ---
> hossman@bester:~/tmp$ java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> hossman@bester:~/tmp$ java -XX:OnOutOfMemoryError="echo HOOK" -Xmx64M Test
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="echo HOOK"
> #   Executing /bin/sh -c "echo HOOK"...
> HOOK
> caught
> caught
> caught
> caught
> caught
> hossman@bester:~/tmp$
> --
>
>
>
> -Hoss




-- 
- sent from my mobile
6176064373


RE: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Michael Ryan
I think the problem is that the  config needs to be inside of the
 config, rather than after it as your have.

-Michael


Re: SOLR error with custom FacetComponent

2011-09-21 Thread Erik Hatcher
Why create a custom facet component for this?

Simply add lines like this to your request handler(s):

manu_exact

either in defaults or appends sections.

Erik



On Sep 21, 2011, at 14:00 , Ravi Bulusu wrote:

> Hi All,
> 
> 
> I'm trying to write a custom SOLR facet component and I'm getting some
> errors when I deploy my code into the SOLR server.
> 
> Can you please let me know what Im doing wrong? I appreciate your help on
> this issue. Thanks.
> 
> *Issue*
> 
> I'm getting an error saying "Error instantiating SearchComponent  Class> is not a org.apache.solr.handler.component.SearchComponent".
> 
> My custom class inherits from *FacetComponent* which extends from *
> SearchComponent*.
> 
> My custom class is defined as follows…
> 
> I implemented the process method to meet our functionality.
> 
> We have some default facets that have to be sent every time, irrespective of
> the Query request.
> 
> 
> /**
> 
> *
> 
> * @author ravibulusu
> 
> */
> 
> public class MyFacetComponent extends FacetComponent {
> 
> ….
> 
> }



Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Shawn Heisey

On 9/21/2011 11:18 AM, Shawn Heisey wrote:
With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem 
to be ignored.  I've got both set to 35, but Solr is merging every 10 
segments.  I haven't tried explicitly setting mergeFactor yet to see 
if that will make the other settings override it, I'm letting the 
current import finish first.


I have tried again with mergeFactor set to 8 and the other settings in 
mergePolicy remaining at 35.  It merged after every 8th segment.  This 
is on lucene_solr_3_4 checked out from SVN, with SOLR-1972 manually 
applied.  Settings used this time:



false
8

4
4


96
32768
1000
1
native



35
35
105


If there's anything else you'd like me to do, please let me know and 
I'll get to it as soon as I can.


Thanks,
Shawn



SOLR error with custom FacetComponent

2011-09-21 Thread Ravi Bulusu
Hi All,


I'm trying to write a custom SOLR facet component and I'm getting some
errors when I deploy my code into the SOLR server.

Can you please let me know what Im doing wrong? I appreciate your help on
this issue. Thanks.

*Issue*

I'm getting an error saying "Error instantiating SearchComponent  is not a org.apache.solr.handler.component.SearchComponent".

My custom class inherits from *FacetComponent* which extends from *
SearchComponent*.

My custom class is defined as follows…

I implemented the process method to meet our functionality.

We have some default facets that have to be sent every time, irrespective of
the Query request.


/**

 *

 * @author ravibulusu

 */

public class MyFacetComponent extends FacetComponent {

….

}


Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Shawn Heisey

On 9/21/2011 3:10 PM, Chris Hostetter wrote:

: With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to be
: ignored.  I've got both set to 35, but Solr is merging every 10 segments.  I
...
: Here's the relevant config pieces.  These two sections are in separate files
: incorporated into solrconfig.xml using xinclude:
:
:
...

do you have a "" section with mergeFactor defined there?


The mergeFactor section is in my config, but it's commented out.  I left 
out the commented sections when I included it before.  It doesn't appear 
anywhere else.  Here's the full config snippet with comments:



false


4
4


96
32768
1000
1
native


Here's the mainIndex section:


true
true

1
0

false


Thanks,
Shawn



Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Chris Hostetter

: Usually any good piece of java code refrains from capturing Throwable
: so that Errors will bubble up unlike exceptions. Having said that,

Even if some piece of code catches an OutOfMemoryError, the JVM should 
have already called the "-XX:OnOutOfMemoryError" hook - Although from what 
i can tell, the JVM will only call the hook on hte *first* OOM thrown

(you can try the code below to test this behavior in your own JVM) 

: > I'm trying to be notified when the error occurs.  I saw with the jvm I can
: > pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time
: > the out of memory issue occurs though my script never runs. Does solr let

...exactly what JVM are you running?  this option is specific to the 
Sun/Oracle JVM.  For example, in the IBM JVM, there is a completley 
different mechanism...

http://stackoverflow.com/questions/3467219/is-there-something-like-xxonerror-or-xxonoutofmemoryerror-in-ibm-jvm


-- Simple OnOutOfMemoryError hook test -
import static java.lang.System.out;
import java.util.ArrayList;
public final class Test {
  public static void main(String... args) throws Exception {
ArrayList data = new ArrayList(1000);
for (int i=0; i<5; i++) {
  try {
while (i < 5) {
  data.add(new ArrayList(10));
}
  } catch (OutOfMemoryError oom) {
data.clear();
out.println("caught");
  }
}
  }
}
-- example of running it ---
hossman@bester:~/tmp$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
hossman@bester:~/tmp$ java -XX:OnOutOfMemoryError="echo HOOK" -Xmx64M Test 
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="echo HOOK"
#   Executing /bin/sh -c "echo HOOK"...
HOOK
caught
caught
caught
caught
caught
hossman@bester:~/tmp$ 
--



-Hoss

Re: Two unrelated questions

2011-09-21 Thread Rob Casson
for #1, i don't use DIH, but is there any possibility of that column
having duplicate keys, with subsequent docs replacing existing ones?

and for #2, for some cases you could use a negative filterquery:

 
http://wiki.apache.org/solr/SimpleFacetParameters#Retrieve_docs_with_facets_missing

so instead of that "fq=-facetField:[* TO *]", something like
"fq=-car_make:Taurus".  picking "negatives" might even make the UI a
bit easier.

anyway, just some thoughts.  cheers,
rob

On Wed, Sep 21, 2011 at 5:17 PM, Olson, Ron  wrote:
> Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a 
> PK field, generated by a sequence, so there are records with ID of 1, 2, 3, 
> etc. That same id is the one I use in my unique id field in the document 
> (ID).
>
> I've noticed that the table has, say, 10 rows. My index only has 8. I don't 
> know why that is, but I'd like to figure out which records are missing and 
> add them (and hopefully understand why they weren't added in the first 
> place). I was just wondering if there was some way to compare the two as part 
> of a sql query, but on reflection, it does seem like an absurd request, so I 
> apologize; I think what I'll have to do is write a solrj program that gets 
> every ID in the table, then does a search on that ID in the index, and add 
> the ones that are missing.
>
> Regarding the second item, yes, it's crazy but I'm not sure what to do; there 
> really are that many options and some searches will be extremely specific, 
> yet broad enough in terms for this to be a problem.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, September 21, 2011 3:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Two unrelated questions
>
> for <1> I don't quite get what you're driving at. Your DIH
> query assigns the uniqueKey, it's not like it's something
> auto-generated. Perhaps a concrete example would
> help.
>
> <2> There's a limit you can adjust that defaults to
> 1024 (maxBooleanClauses in solrconfig.xml). You can
>  bump this very high, but you're right, if anyone actually
> does something absurd it'll slow *that* query down. But
> just bumping this query higher won't change performance
> absent someone actually putting a ton of items in it...
>
> Best
> Erick
>
> On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron  wrote:
>> Hi all-
>>
>> I'm not sure if I should break this out into two separate questions to the 
>> list for searching purposes, or if one is more acceptable (don't want to 
>> flood).
>>
>> I have two (hopefully) straightforward questions:
>>
>> 1. Is it possible to expose the unique ID of a document to a DIH query? The 
>> reason I want to do this is because I use the unique ID of the row in the 
>> table as the unique ID of the Lucene document, but I've noticed that the 
>> counts of documents doesn't match the count in the table; I'd like to add 
>> these rows and was hoping to avoid writing a custom SolrJ app to do it.
>>
>> 2. Is there any limit to the number of conditions in a Boolean search? We're 
>> working on a new project where the user can choose either, for example, 
>> "Ford Vehicles", in which case I can simply search for "Ford", but if the 
>> user chooses specific makes and models, then I have to say something like 
>> "Crown Vic OR Focus OR Taurus OR F-150", etc., where they could 
>> theoretically choose every model of Ford ever made except one. This could 
>> lead to a *very* large query, and was worried both that it was even 
>> possible, but also the impact on performance.
>>
>>
>> Thanks, and I apologize if this really should be two separate messages.
>>
>> Ron
>>
>> DISCLAIMER: This electronic message, including any attachments, files or 
>> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
>> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
>> recipient, you are hereby notified that any use, disclosure, copying or 
>> distribution of this message or any of the information included in or with 
>> it is  unauthorized and strictly prohibited.  If you have received this 
>> message in error, please notify the sender immediately by reply e-mail and 
>> permanently delete and destroy this message and its attachments, along with 
>> any copies thereof. This message does not create any contractual obligation 
>> on behalf of the sender or Law Bulletin Publishing Company.
>> Thank you.
>>
>
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply

RE: Two unrelated questions

2011-09-21 Thread Olson, Ron
Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a PK 
field, generated by a sequence, so there are records with ID of 1, 2, 3, etc. 
That same id is the one I use in my unique id field in the document 
(ID).

I've noticed that the table has, say, 10 rows. My index only has 8. I don't 
know why that is, but I'd like to figure out which records are missing and add 
them (and hopefully understand why they weren't added in the first place). I 
was just wondering if there was some way to compare the two as part of a sql 
query, but on reflection, it does seem like an absurd request, so I apologize; 
I think what I'll have to do is write a solrj program that gets every ID in the 
table, then does a search on that ID in the index, and add the ones that are 
missing.

Regarding the second item, yes, it's crazy but I'm not sure what to do; there 
really are that many options and some searches will be extremely specific, yet 
broad enough in terms for this to be a problem.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, September 21, 2011 3:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Two unrelated questions

for <1> I don't quite get what you're driving at. Your DIH
query assigns the uniqueKey, it's not like it's something
auto-generated. Perhaps a concrete example would
help.

<2> There's a limit you can adjust that defaults to
1024 (maxBooleanClauses in solrconfig.xml). You can
 bump this very high, but you're right, if anyone actually
does something absurd it'll slow *that* query down. But
just bumping this query higher won't change performance
absent someone actually putting a ton of items in it...

Best
Erick

On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron  wrote:
> Hi all-
>
> I'm not sure if I should break this out into two separate questions to the 
> list for searching purposes, or if one is more acceptable (don't want to 
> flood).
>
> I have two (hopefully) straightforward questions:
>
> 1. Is it possible to expose the unique ID of a document to a DIH query? The 
> reason I want to do this is because I use the unique ID of the row in the 
> table as the unique ID of the Lucene document, but I've noticed that the 
> counts of documents doesn't match the count in the table; I'd like to add 
> these rows and was hoping to avoid writing a custom SolrJ app to do it.
>
> 2. Is there any limit to the number of conditions in a Boolean search? We're 
> working on a new project where the user can choose either, for example, "Ford 
> Vehicles", in which case I can simply search for "Ford", but if the user 
> chooses specific makes and models, then I have to say something like "Crown 
> Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose 
> every model of Ford ever made except one. This could lead to a *very* large 
> query, and was worried both that it was even possible, but also the impact on 
> performance.
>
>
> Thanks, and I apologize if this really should be two separate messages.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Chris Hostetter

: With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to be
: ignored.  I've got both set to 35, but Solr is merging every 10 segments.  I
...
: Here's the relevant config pieces.  These two sections are in separate files
: incorporated into solrconfig.xml using xinclude:
: 
: 
...

do you have a "" section with mergeFactor defined there?

-Hoss


Re: Slow autocomplete(terms)

2011-09-21 Thread Erick Erickson
Think about ngrams if you really need infix searches,
you're right that the regex is very probably the
root of your problem. The index has to examine
*every* term in the field to determine if the regex
will match.

Best
Erick

On Tue, Sep 20, 2011 at 12:57 AM, roySolr  wrote:
> Hello,
>
> I used the terms request for autocomplete. It works fine with 200.000
> records but with 2 million docs it's very slow..
>
> I use some regex to fix autocomplete in the middle of words, example: chest
> -> manchester.
>
> My call(pecl PHP solr):
>
> $query = new SolrQuery();
> $query->setTermsLimit("10");
>
> $query->setTerms(true);
> $query->setTermsField($field);
>
> $term = SolrUtils::escapeQueryChars ($term);
> $query->set("terms.regex","(.*)$term(.*)");
> $query->set("terms.regex.flag","case_insensitive");
>
> URL:
> /solr/terms?terms.fl=autocompletewhat&terms.regex=(.*)chest(.*)&terms.regex.flag=case_insensitive&terms=true
>
> I think the regex is the reason for the very high query time: Solr search
> between 2 million docs with a regex. The query takes 2 seconds, this is to
> much for the autocomplete. A user typed "manchester united" and solr needs
> to do 16 query's from 2 seconds. Are there some other options? Faster
> solutions?
>
> I use solr 3.1
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Slow-autocomplete-terms-tp3351352p3351352.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Two unrelated questions

2011-09-21 Thread Erick Erickson
for <1> I don't quite get what you're driving at. Your DIH
query assigns the uniqueKey, it's not like it's something
auto-generated. Perhaps a concrete example would
help.

<2> There's a limit you can adjust that defaults to
1024 (maxBooleanClauses in solrconfig.xml). You can
 bump this very high, but you're right, if anyone actually
does something absurd it'll slow *that* query down. But
just bumping this query higher won't change performance
absent someone actually putting a ton of items in it...

Best
Erick

On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron  wrote:
> Hi all-
>
> I'm not sure if I should break this out into two separate questions to the 
> list for searching purposes, or if one is more acceptable (don't want to 
> flood).
>
> I have two (hopefully) straightforward questions:
>
> 1. Is it possible to expose the unique ID of a document to a DIH query? The 
> reason I want to do this is because I use the unique ID of the row in the 
> table as the unique ID of the Lucene document, but I've noticed that the 
> counts of documents doesn't match the count in the table; I'd like to add 
> these rows and was hoping to avoid writing a custom SolrJ app to do it.
>
> 2. Is there any limit to the number of conditions in a Boolean search? We're 
> working on a new project where the user can choose either, for example, "Ford 
> Vehicles", in which case I can simply search for "Ford", but if the user 
> chooses specific makes and models, then I have to say something like "Crown 
> Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose 
> every model of Ford ever made except one. This could lead to a *very* large 
> query, and was worried both that it was even possible, but also the impact on 
> performance.
>
>
> Thanks, and I apologize if this really should be two separate messages.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>


Implementing a custom ResourceLoader

2011-09-21 Thread Jithin Emmanuel
Hi,
As part of writing a solr plugin I need to override the ResourceLoader. My
plugin is intended stop word analyzer filter factory and I need to change
the way stop words are being fetched. My assumption is overriding
ResourceLoader->getLines() will help me to meet my target of fetching stop
word data from an external webservice.
Is thisi feasible? Or should I go about overriding
Factory->inform(ResourceLoader) method. Kindly let me know how to achieve
this.

-- Thanks
Jithin


Re: Sort five random "Top Offers" to the top

2011-09-21 Thread Sujit Pal
Hi MOuli,

AFAIK (and I don't know that much about Solr), this feature does not
exist out of the box in Solr. One way to achieve this could be to
construct a DocSet with topoffer:true and intersect it with your result
DocSet, then select the first 5 off the intersection, randomly shuffle
them, sublist [0:5], and move the sublist to the top of the results like
QueryElevationComponent does. Actually you may want to take a look at
QueryElevationComponent code for inspiration (this is where I would have
looked if I had to implement something similar).

-sujit
 
On Wed, 2011-09-21 at 06:54 -0700, MOuli wrote:
> Hey Community.
> 
> I got a Lucene/Solr Index with many offers. Some of them are marked by a
> flag field "topoffer" that they are top offers. Now I want so sort randomly
> 5 of this offers on the top.
> 
> For Example
> HTC Sensation
>  - topoffer = true
> HTC Desire
>  - topoffer = false
> Samsung Galaxy S2
>  - topoffer = ture
> IPhone 4
>  - topoffer = true 
> ...
> 
> When i search for a Handy then i want that first 3 offers are HTC Sensation,
> Samsung Galaxy S2 and the iPhone 4.
> 
> 
> Does anyone have an idea?
> 
> PS.: I hope my english is not to bad 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3355469.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: strange copied field problem

2011-09-21 Thread Pulkit Singhal
No probs. I would still hope someone would comment on you thread with
some expert opinions about making a copy of a copy :)

On Wed, Sep 21, 2011 at 1:38 PM, Tanner Postert
 wrote:
> sure enough that worked. could have sworn we had it this way before, but
> either way, that fixed it. Thanks.
>
> On Wed, Sep 21, 2011 at 11:01 AM, Tanner Postert
> wrote:
>
>> i believe that was the original configuration, but I can switch it back and
>> see if that yields any results.
>>
>>
>> On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal 
>> wrote:
>>
>>> I am NOT claiming that making a copy of a copy field is wrong or leads
>>> to a race condition. I don't know that. BUT did you try to copy into
>>> the text field directly from the genre field? Instead of the
>>> genre_search field? Did that yield working queries?
>>>
>>> On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert
>>>  wrote:
>>> > i have 3 fields that I am working with: genre, genre_search and text.
>>> genre
>>> > is a string field which comes from the data source. genre_search is a
>>> text
>>> > field that is copied from genre, and text is a text field that is copied
>>> > from genre_search and a few other fields. Text field is the default
>>> search
>>> > field for queries. When I search for q=genre_search:indie+rock, solr
>>> returns
>>> > several records that have both Indie as a genre and Rock as a genre,
>>> which
>>> > is great, but when I search for q=indie+rock or q=text:indie+rock, i get
>>> no
>>> > results.
>>> >
>>> > Why would the source field return the value and the destination
>>> wouldn't.
>>> > Both genre_search and text are the same data type, so there shouldn't be
>>> any
>>> > strange translations happening.
>>> >
>>>
>>
>>
>


Production Issue: SolrJ client throwing this error even though field type is not defined in schema

2011-09-21 Thread roz dev
Hi All

We are getting this error in our Production Solr Setup.

Message: Element type "t_sort" must be followed by either attribute
specifications, ">" or "/>".
Solr version is 1.4.1

Stack trace indicates that solr is returning malformed document.


Caused by: org.apache.solr.client.solrj.SolrServerException: Error
executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at 
com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232)
... 15 more
Caused by: org.apache.solr.common.SolrException: parsing error
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
... 17 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[3,136974]
Message: Element type "t_sort" must be followed by either attribute
specifications, ">" or "/>".
at 
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
... 21 more


Re: Debugging DIH by placing breakpoints

2011-09-21 Thread Pulkit Singhal
Correct! With that additional info, plus
http://wiki.apache.org/solr/HowToContribute (ant eclipse), plus a
refreshed (close/open) eclipse project ... I'm all set.

Thanks Again.

On Wed, Sep 21, 2011 at 1:43 PM, Gora Mohanty  wrote:
> On Thu, Sep 22, 2011 at 12:08 AM, Pulkit Singhal
>  wrote:
>> Hello,
>>
>> I was wondering where can I find the source code for DIH? I want to
>> checkout the source and step-trhought it breakpoint by breakpoint to
>> understand it better :)
>
> Should be under contrib/dataimporthandler in your Solr source
> tree.
>
> Regards,
> Gora
>


Re: Debugging DIH by placing breakpoints

2011-09-21 Thread Gora Mohanty
On Thu, Sep 22, 2011 at 12:08 AM, Pulkit Singhal
 wrote:
> Hello,
>
> I was wondering where can I find the source code for DIH? I want to
> checkout the source and step-trhought it breakpoint by breakpoint to
> understand it better :)

Should be under contrib/dataimporthandler in your Solr source
tree.

Regards,
Gora


Re: strange copied field problem

2011-09-21 Thread Tanner Postert
sure enough that worked. could have sworn we had it this way before, but
either way, that fixed it. Thanks.

On Wed, Sep 21, 2011 at 11:01 AM, Tanner Postert
wrote:

> i believe that was the original configuration, but I can switch it back and
> see if that yields any results.
>
>
> On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal 
> wrote:
>
>> I am NOT claiming that making a copy of a copy field is wrong or leads
>> to a race condition. I don't know that. BUT did you try to copy into
>> the text field directly from the genre field? Instead of the
>> genre_search field? Did that yield working queries?
>>
>> On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert
>>  wrote:
>> > i have 3 fields that I am working with: genre, genre_search and text.
>> genre
>> > is a string field which comes from the data source. genre_search is a
>> text
>> > field that is copied from genre, and text is a text field that is copied
>> > from genre_search and a few other fields. Text field is the default
>> search
>> > field for queries. When I search for q=genre_search:indie+rock, solr
>> returns
>> > several records that have both Indie as a genre and Rock as a genre,
>> which
>> > is great, but when I search for q=indie+rock or q=text:indie+rock, i get
>> no
>> > results.
>> >
>> > Why would the source field return the value and the destination
>> wouldn't.
>> > Both genre_search and text are the same data type, so there shouldn't be
>> any
>> > strange translations happening.
>> >
>>
>
>


Debugging DIH by placing breakpoints

2011-09-21 Thread Pulkit Singhal
Hello,

I was wondering where can I find the source code for DIH? I want to
checkout the source and step-trhought it breakpoint by breakpoint to
understand it better :)

Thanks!
- Pulkit


Re: Solr Indexing - Null Values in date field

2011-09-21 Thread Pulkit Singhal
Also you may use the script transformer to explicitly remove the field
from the document if the field is null. I do this for all my sdouble
and sdate fields ... its a bit manual and I would like to see Solr
enhanced to simply skip stuff like this by having a flag for its DIH
code but until then it suffices:

... transformer="DateFormatTransformer,script:skipEmptyFields"

  

  



On Wed, Sep 21, 2011 at 6:06 AM, Gora Mohanty  wrote:
> On Wed, Sep 21, 2011 at 4:08 PM, mechravi25  wrote:
>> Hi,
>>
>> I have a field in my source with data type as string and that field has NULL
>> values. I am trying to index this field in solr as a date data type with
>> multivalued = true. Following is the entry for that field in my schema.xml
> [...]
>
> One cannot have NULL values as input for Solr date fields. The
> multivalued part is irrelevant here.
>
> As it seems like you are getting the input data from a database,
> you will need to supply some invalid date for NULL date values.
> E.g., with mysql, we have:
> COALESCE( CreationDate, STR_TO_DATE( '1970,1,1', '%Y,%m,%d' ) )
> The required syntax will be different for other databases.
>
> Regards,
> Gora
>


Re: add quartz like scheduling cabalities to solr-DIH

2011-09-21 Thread Pulkit Singhal
I think what Ahmet is trying to say is that such functionality does not exist.
As the functionality does not exist, there is no procedure or conf
file related work to speak of.
There has been request to have this work done and you can vote/watch
for it here:
https://issues.apache.org/jira/browse/SOLR-1251

On Fri, Sep 16, 2011 at 7:35 AM, vighnesh  wrote:
> thanks iroxxx
>
>
> but how can l add quartz like scheduling to solr dih ,is there any changes
> required in anyof the configuration files please specify the procedure.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/add-quartz-like-scheduling-cabalities-to-solr-DIH-tp3341141p3341795.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Pulkit Singhal
Usually any good piece of java code refrains from capturing Throwable
so that Errors will bubble up unlike exceptions. Having said that,
perhaps someone in the list can help, if you share which particular
Solr version you are using where you suspect that the Error is being
eaten up.

On Fri, Sep 16, 2011 at 2:47 PM, Jason Toy  wrote:
> I have solr issues where I keep running out of memory. I am working on
> solving the memory issues (this will take a long time), but in the meantime,
> I'm trying to be notified when the error occurs.  I saw with the jvm I can
> pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time
> the out of memory issue occurs though my script never runs. Does solr let
> the error bubble up so that the jvm can call this script? If not how can I
> have a script run when solr gets an out of memory issue?
>


Re: strange copied field problem

2011-09-21 Thread Tanner Postert
i believe that was the original configuration, but I can switch it back and
see if that yields any results.

On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal wrote:

> I am NOT claiming that making a copy of a copy field is wrong or leads
> to a race condition. I don't know that. BUT did you try to copy into
> the text field directly from the genre field? Instead of the
> genre_search field? Did that yield working queries?
>
> On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert
>  wrote:
> > i have 3 fields that I am working with: genre, genre_search and text.
> genre
> > is a string field which comes from the data source. genre_search is a
> text
> > field that is copied from genre, and text is a text field that is copied
> > from genre_search and a few other fields. Text field is the default
> search
> > field for queries. When I search for q=genre_search:indie+rock, solr
> returns
> > several records that have both Indie as a genre and Rock as a genre,
> which
> > is great, but when I search for q=indie+rock or q=text:indie+rock, i get
> no
> > results.
> >
> > Why would the source field return the value and the destination wouldn't.
> > Both genre_search and text are the same data type, so there shouldn't be
> any
> > strange translations happening.
> >
>


Re: strange copied field problem

2011-09-21 Thread Pulkit Singhal
I am NOT claiming that making a copy of a copy field is wrong or leads
to a race condition. I don't know that. BUT did you try to copy into
the text field directly from the genre field? Instead of the
genre_search field? Did that yield working queries?

On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert
 wrote:
> i have 3 fields that I am working with: genre, genre_search and text. genre
> is a string field which comes from the data source. genre_search is a text
> field that is copied from genre, and text is a text field that is copied
> from genre_search and a few other fields. Text field is the default search
> field for queries. When I search for q=genre_search:indie+rock, solr returns
> several records that have both Indie as a genre and Rock as a genre, which
> is great, but when I search for q=indie+rock or q=text:indie+rock, i get no
> results.
>
> Why would the source field return the value and the destination wouldn't.
> Both genre_search and text are the same data type, so there shouldn't be any
> strange translations happening.
>


SolrCloud state

2011-09-21 Thread Miguel Coxo
Hi there.

I'm starting a new project using solr and i would like to know if solr is
able to setup a cluster with fault tolerance.

I'm setting up an environment with two shards. Each shard should have a
replica.

What i would like to know is if a shard master fails will the replica be
"promoted" to a master. Or will it remain search only and only recover when
a new master is setup.

Also how is the document indexing distributed by the shards? Can i add a new
shard dynamically?

All the best, Miguel Coxo.


Re: FW: MMapDirectory failed to map a 23G compound index segment

2011-09-21 Thread Yongtao Liu
I hit similar issue recently.
Not sure if MMapDirectory is right way to go.

When index file be map to ram, JVM will call OS file mapping function.
The memory usage is in share memory, it may not be calculate to JVM process
space.

I saw one problem is if the index file bigger then physical ram, and there
are lot of query which cause wide index file access.
Then, the machine has no available memory.
The system change to very slow.

What i did is change lucene code to disable MMapDirectory.

On Wed, Sep 21, 2011 at 1:26 PM, Yongtao Liu  wrote:

>
>
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Tuesday, September 20, 2011 3:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MMapDirectory failed to map a 23G compound index segment
>
> Since you hit OOME during mmap, I think this is an OS issue not a JVM
> issue.  Ie, the JVM isn't running out of memory.
>
> How many segments were in the unoptimized index?  It's possible the OS
> rejected the mmap because of process limits.  Run "cat
> /proc/sys/vm/max_map_count" to see how many mmaps are allowed.
>
> Or: is it possible you reopened the reader several times against the index
> (ie, after committing from Solr)?  If so, I think 2.9.x never unmaps the
> mapped areas, and so this would "accumulate" against the system limit.
>
> > My memory of this is a little rusty but isn't mmap also limited by mem +
> swap on the box? What does 'free -g' report?
>
> I don't think this should be the case; you are using a 64 bit OS/JVM so in
> theory (except for OS system wide / per-process limits imposed) you should
> be able to mmap up to the full 64 bit address space.
>
> Your virtual memory is unlimited (from "ulimit" output), so that's good.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Sep 7, 2011 at 12:25 PM, Rich Cariens 
> wrote:
> > Ahoy ahoy!
> >
> > I've run into the dreaded OOM error with MMapDirectory on a 23G cfs
> > compound index segment file. The stack trace looks pretty much like
> > every other trace I've found when searching for OOM & "map failed"[1].
> > My configuration
> > follows:
> >
> > Solr 1.4.1/Lucene 2.9.3 (plus
> > SOLR-1969
> > )
> > CentOS 4.9 (Final)
> > Linux 2.6.9-100.ELsmp x86_64 yada yada yada Java SE (build
> > 1.6.0_21-b06) Hotspot 64-bit Server VM (build 17.0-b16, mixed mode)
> > ulimits:
> >core file size (blocks, -c) 0
> >data seg size(kbytes, -d) unlimited
> >file size (blocks, -f) unlimited
> >pending signals(-i) 1024
> >max locked memory (kbytes, -l) 32
> >max memory size (kbytes, -m) unlimited
> >open files(-n) 256000
> >pipe size (512 bytes, -p) 8
> >POSIX message queues (bytes, -q) 819200
> >stack size(kbytes, -s) 10240
> >cpu time(seconds, -t) unlimited
> >max user processes (-u) 1064959
> >virtual memory(kbytes, -v) unlimited
> >file locks(-x) unlimited
> >
> > Any suggestions?
> >
> > Thanks in advance,
> > Rich
> >
> > [1]
> > ...
> > java.io.IOException: Map failed
> >  at sun.nio.ch.FileChannelImpl.map(Unknown Source)
> >  at
> > org.apache.lucene.store.MMapDirectory$MMapIndexInput.(Unknown
> > Source)
> >  at
> > org.apache.lucene.store.MMapDirectory$MMapIndexInput.(Unknown
> > Source)
> >  at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source)
> >  at org.apache.lucene.index.SegmentReader$CoreReaders.(Unknown
> > Source)
> >
> >  at org.apache.lucene.index.SegmentReader.get(Unknown Source)
> >  at org.apache.lucene.index.SegmentReader.get(Unknown Source)
> >  at org.apache.lucene.index.DirectoryReader.(Unknown Source)
> >  at org.apache.lucene.index.ReadOnlyDirectoryReader.(Unknown
> > Source)
> >  at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown Source)
> >  at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown
> > Source)
> >  at org.apache.lucene.index.DirectoryReader.open(Unknown Source)
> >  at org.apache.lucene.index.IndexReader.open(Unknown Source) ...
> > Caused by: java.lang.OutOfMemoryError: Map failed
> >  at sun.nio.ch.FileChannelImpl.map0(Native Method) ...
> >
> **Legal Disclaimer***
> "This communication may contain confidential and privileged
> material for the sole use of the intended recipient. Any
> unauthorized review, use or distribution by others is strictly
> prohibited. If you have received the message in error, please
> advise the sender by reply email and delete the message. Thank
> you."
> *
>


strange copied field problem

2011-09-21 Thread Tanner Postert
i have 3 fields that I am working with: genre, genre_search and text. genre
is a string field which comes from the data source. genre_search is a text
field that is copied from genre, and text is a text field that is copied
from genre_search and a few other fields. Text field is the default search
field for queries. When I search for q=genre_search:indie+rock, solr returns
several records that have both Indie as a genre and Rock as a genre, which
is great, but when I search for q=indie+rock or q=text:indie+rock, i get no
results.

Why would the source field return the value and the destination wouldn't.
Both genre_search and text are the same data type, so there shouldn't be any
strange translations happening.


Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Shawn Heisey

On 9/20/2011 4:09 PM, Robert Muir wrote:
yes, mergeFactory=10 is interpreted as both segmentsPerTier and 
maxMergeAtOnce. yes, specifying explicit TieredMP parameters will 
override whatever you set in mergeFactor (which is basically only 
interpreted to be backwards compatible) this is why i created this 
confusing test configuration: to test this exact case. 


I've got a checked out lucene_solr_3_4 and this isn't what I'm seeing.
Solr Implementation Version: 3.4-SNAPSHOT 1173320M - root - 2011-09-21 
09:58:58


With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to 
be ignored.  I've got both set to 35, but Solr is merging every 10 
segments.  I haven't tried explicitly setting mergeFactor yet to see if 
that will make the other settings override it, I'm letting the current 
import finish first.


Here's the relevant config pieces.  These two sections are in separate 
files incorporated into solrconfig.xml using xinclude:



false

4
4

96
32768
1000
1
native



35
35
105


Thanks,
Shawn



Re: How to write core's name in log

2011-09-21 Thread Pulkit Singhal
Not sure if this is a good lead for you but when I run out-of-the-box
multi-core example-DIH instance of Solr, I often see core name thrown
about in the logs. Perhaps you can look there?

On Thu, Sep 15, 2011 at 6:50 AM, Joan  wrote:
> Hi,
>
> I have multiple core in Solr and I want to write core name in log through to
> lo4j.
>
> I've found in SolrException a method called log(Logger log, Throwable e) but
> when It try to build a Exception it haven't core's name.
>
> The Exception is built in toStr() method in SolrException class, so I want
> to write core's name in the message of Exception.
>
> I'm thinking to add MDC variable, this will be name of core. Finally I'll
> use it in log4j configuration like this in ConversionPattern %X{core}
>
> The idea is that when Solr received a request I'll add this new variable
> "name of core".
>
> But I don't know if it's a good idea or not.
>
> or Do you already exists any solution for add name of core in log?
>
> Thanks
>
> Joan
>


Best Practices for indexing nested XML in Solr via DIH

2011-09-21 Thread Pulkit Singhal
Hello Everyone,

I was wondering what are the various best practices that everyone
follows for indexing nested XML into Solr. Please don't feel limited
by examples, feel free to share your own experiences.

Given an xml structure such as the following:


cat001
Everything


cat002
Music


cat003
Pop



How do you make the best use of the data when indexing?

1) Do you use Scenario A?
categoryPath_category_id = cat001 cat002 cat003 (flattened)
categoryPath_category_name = Everything Music Pop (flattened)
If so then how do you manage to find the corresponding
categoryPath_category_id if someone's search matches a value in the
categoryPath_category_name field? I understand that Solr is not about
lookups but this may be important information for you to display right
away as part of the search results page rendering.

2) Do you use Scenario B?
categoryPath_category_id = [cat001 cat002 cat003] (the [] signifies a
multi-value field)
categoryPath_category_name = [Everything Music Pop] (the [] signifies
a multi-value field)
And once again how do you find associated data sets once something matches.
Side Question: How can one configure DIH to store the data this way
for Scenario B?

Thanks!
- Pulkit


Selective values for facets

2011-09-21 Thread ntsrikanth
Hi,

 The dataset I have got is for special offers. 
We got lot of offer codes. But I need to create few facets for specific
conditions only.

For example, I got the following codes: ABCD, AGTR, KUYH, NEWY, NEWA, NEWB,
EAS1, EAS2

And I need to create a facet like 
'New Year Offers' mapped with NEWA, NEWB, NEWY and
'Easter Offers' mapped with EAS1, EAS2

I dont want other codes returned in the facet when I query it. How to
prevent other values to be ignored while creating the facet during indexing
time?

Thanks,
Srikanth NT



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Selective-values-for-facets-tp3355676p3355676.html
Sent from the Solr - User mailing list archive at Nabble.com.


LocalParams, bq, and highlighting

2011-09-21 Thread Demian Katz
I've run into another strange behavior related to LocalParams syntax in Solr 
1.4.1.  If I apply Dismax boosts using bq in LocalParams syntax, the contents 
of the boost queries get used by the highlighter.  Obviously, when I use bq as 
a separate parameter, this is not an issue.

To clarify, here are two searches that yield identical results but different 
highlighting behaviors:

http://localhost:8080/solr/biblio/select/?q=john&rows=20&start=0&indent=yes&qf=author^100&qt=dismax&bq=author%3Asmith^1000&fl=score&hl=true&hl.fl=*

http://localhost:8080/solr/biblio/select/?q=%28%28_query_%3A%22{!dismax+qf%3D\%22author^100\%22+bq%3D\%27author%3Asmith^1000\%27}john%22%29%29&rows=20&start=0&indent=yes&fl=score&hl=true&hl.fl=*

Query #1 highlights only "john" (the desired behavior), but query #2 highlights 
both "john" and "smith".

Is this a known limitation of the highlighter, or is it a bug?  Is this issue 
resolved in newer versions of Solr?

thanks,
Demian


Re: boost a document which has a field not empty

2011-09-21 Thread Zoltan Altfatter
Yes, I am using edismax and the bq parameter did the trick. Thanks a lot.

On Wed, Sep 21, 2011 at 3:59 PM, Ahmet Arslan  wrote:

> > I have one entity called organisation. I am indexing their
> > name to be able
> > to search afterwards on their name.
> > I store also the website of the organisation. Some
> > organisations have a
> > website some don't.
> > Can I achieve that when searching for organisations even if
> > I have a match
> > on their name I will show first those which have a
> > website.
>
> Which query parser are you using? lucene? (e)dismax?
>
> If lucene (default one), you can add an optional clause to your query:
>
> &q=+(some query) website:[* TO *]^10 (assuming you have OR as default
> operator)
>
> If dismax, there is a bq parameter which accepts lucene query syntax
> &bq=website:[* TO *]^10
>
> http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29
>


Re: MMapDirectory failed to map a 23G compound index segment

2011-09-21 Thread Robert Muir
On Tue, Sep 20, 2011 at 12:32 PM, Michael McCandless
 wrote:
>
> Or: is it possible you reopened the reader several times against the
> index (ie, after committing from Solr)?  If so, I think 2.9.x never
> unmaps the mapped areas, and so this would "accumulate" against the
> system limit.

In order to unmap in Lucene 2.9.x you must specifically turn this
unmapping on with setUseUnmapHack(true)

-- 
lucidimagination.com


Re: boost a document which has a field not empty

2011-09-21 Thread Ahmet Arslan
> I have one entity called organisation. I am indexing their
> name to be able
> to search afterwards on their name.
> I store also the website of the organisation. Some
> organisations have a
> website some don't.
> Can I achieve that when searching for organisations even if
> I have a match
> on their name I will show first those which have a
> website.

Which query parser are you using? lucene? (e)dismax?

If lucene (default one), you can add an optional clause to your query:

&q=+(some query) website:[* TO *]^10 (assuming you have OR as default operator)

If dismax, there is a bq parameter which accepts lucene query syntax 
&bq=website:[* TO *]^10

http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29


Sort five random "Top Offers" to the top

2011-09-21 Thread MOuli
Hey Community.

I got a Lucene/Solr Index with many offers. Some of them are marked by a
flag field "topoffer" that they are top offers. Now I want so sort randomly
5 of this offers on the top.

For Example
HTC Sensation
 - topoffer = true
HTC Desire
 - topoffer = false
Samsung Galaxy S2
 - topoffer = ture
IPhone 4
 - topoffer = true 
...

When i search for a Handy then i want that first 3 offers are HTC Sensation,
Samsung Galaxy S2 and the iPhone 4.


Does anyone have an idea?

PS.: I hope my english is not to bad 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3355469.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boost a document which has a field not empty

2011-09-21 Thread Alexei Martchenko
Can u assign a doc boost at index time?

2011/9/21 Zoltan Altfatter 

> Hi,
>
> I have one entity called organisation. I am indexing their name to be able
> to search afterwards on their name.
> I store also the website of the organisation. Some organisations have a
> website some don't.
> Can I achieve that when searching for organisations even if I have a match
> on their name I will show first those which have a website.
>
> Thank you.
>
> Regards,
> Zoltan
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Problem using EdgeNGram

2011-09-21 Thread O. Klein
Try using KeywordTokenizerFactory instead of StandardTokenizerFactory to get
the results you want.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-using-EdgeNGram-tp3355132p3355211.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JSON response with SolrJ

2011-09-21 Thread Parvin Gasimzade
Hi,

Similar question asked before.Maybe it can help.
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-td1002024.html

On Wed, Sep 21, 2011 at 3:01 PM, Kissue Kissue  wrote:

> Hi,
>
> I am using solr 3.3 with SolrJ. Does anybody have any idea how i can
> retrieve JSON response with SolrJ? Is it possible? It seems to be more
> focused on XML and Beans.
>
> Thanks.
>


JSON response with SolrJ

2011-09-21 Thread Kissue Kissue
Hi,

I am using solr 3.3 with SolrJ. Does anybody have any idea how i can
retrieve JSON response with SolrJ? Is it possible? It seems to be more
focused on XML and Beans.

Thanks.


Problem using EdgeNGram

2011-09-21 Thread Kissue Kissue
Hi,

I am using solr 3.3 with SolrJ. I am trying to use EdgeNgram to power auto
suggest feature in my application. My understanding is that using EdgeNgram
would mean that results will only be returned for records starting with the
search criteria but this is not happening for me.

For example if i search for "tr", i get results as following:

Greenham Trading 6
IT Training Publications
AA Training

Below are details of my configuration:


  



  
  


  




Any ideas why this is happening will be much appreciated.

Thanks.


Fuzzy Suggester

2011-09-21 Thread O. Klein
>From http://wiki.apache.org/solr/Suggester:

JaspellLookup can provide "fuzzy" suggestions, though this functionality is
not currently exposed (it's a one line change in JaspellLookup).

Anybody know what change this would have to be?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-Suggester-tp3355111p3355111.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Indexing - Null Values in date field

2011-09-21 Thread Gora Mohanty
On Wed, Sep 21, 2011 at 4:08 PM, mechravi25  wrote:
> Hi,
>
> I have a field in my source with data type as string and that field has NULL
> values. I am trying to index this field in solr as a date data type with
> multivalued = true. Following is the entry for that field in my schema.xml
[...]

One cannot have NULL values as input for Solr date fields. The
multivalued part is irrelevant here.

As it seems like you are getting the input data from a database,
you will need to supply some invalid date for NULL date values.
E.g., with mysql, we have:
COALESCE( CreationDate, STR_TO_DATE( '1970,1,1', '%Y,%m,%d' ) )
The required syntax will be different for other databases.

Regards,
Gora


Solr Indexing - Null Values in date field

2011-09-21 Thread mechravi25
Hi,

I have a field in my source with data type as string and that field has NULL
values. I am trying to index this field in solr as a date data type with
multivalued = true. Following is the entry for that field in my schema.xml



When I try to index, I get the following exception

org.apache.solr.common.SolrException: Invalid Date String:''
at org.apache.solr.schema.DateField.parseMath(DateField.java:163)
at 
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:171)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:95)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:618)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:261)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)


I even tried giving the IFNULL condition in my query for that field(Eg:
IFNULL(startdate,'') and also IFNULL(startdate,NULL)) but still I am getting
the same exception.

Is there any way to index the null values as such in date field? 

Please help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-Null-Values-in-date-field-tp3355068p3355068.html
Sent from the Solr - User mailing list archive at Nabble.com.


boost a document which has a field not empty

2011-09-21 Thread Zoltan Altfatter
Hi,

I have one entity called organisation. I am indexing their name to be able
to search afterwards on their name.
I store also the website of the organisation. Some organisations have a
website some don't.
Can I achieve that when searching for organisations even if I have a match
on their name I will show first those which have a website.

Thank you.

Regards,
Zoltan