Re: Facet pivot 50.000.000 different values

2013-05-18 Thread Carlos Bonilla
Hi Mikhail,
yes the thing is that I need to take into account different queries and
that's why I can't use the Terms Component.

Cheers.


2013/5/17 Mikhail Khludnev mkhlud...@griddynamics.com

 On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla
 carlosbonill...@gmail.comwrote:

  We
  only need to calculate how many different B values have more than 1
  document but it takes ages
 

 Carlos,
 It's not clear whether you need to take results of a query into account or
 just gather statistics from index. if later you can just enumerate terms
 and watch into TermsEnum.docFreq() . Am I getting it right?


 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Best query method

2013-05-18 Thread J Mohamed Zahoor
Hi

I am using solr 4.2.1. 

My index has products from different stores with different attributes.

If i want to get the count of all products which belongs to store X which is 
coloured red and is in-stock…


My question is : Which way of querying is better in-terms of performance and 
cache usage.


1) q=*.*fq=(store:X) AND (colour:red) AND (in-stock:true)

2) q=store:Xfq=(colour:red) AND (in-stock:true)

3) q=store:Xfq=colour:redfq:in-stock:true

f there is any other option better than these three.. please add let me know..

i am  assuming that which ever filter eliminates more products… should come 
first (q, then list of fq's)



./zahoor
 

Re: Searching for terms having embedded white spaces like word1 word2

2013-05-18 Thread kobe.free.wo...@gmail.com
Thank you so very much Jack for your prompt reply. Your solution worked for
us.

I have another issue in querying fields having values of the sort
stringThis is good/stringstringThis is also good/stringstringThis
is excellent/string. I want to perform StartsWith as well as 'Contains
searches on this field. The field definition is as follow,

  fieldType name=cust_str class=solr.TextField
positionIncrementGap=100 sortMissingLast=true
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory /
  /analyzer
  analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

Please suggest how to perform the above mentioned search.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-for-terms-having-embedded-white-spaces-like-word1-word2-tp4064170p4064355.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching for terms having embedded white spaces like word1 word2

2013-05-18 Thread Jack Krupansky
Ideally, such a text search should be done using tokenized text and span 
query. Maybe you could do it using the surround query parser, but you 
should be able to do it using the LucidWorks Search query parser:


this is BEFORE:1 (good OR excellent)

But, given that you have a keyword tokenizer with embedded white space, you 
should be able to write a Lucene regex query for the same as raw text, 
something like [untested!]:


/this\\s+is\\s+(\\w\\s+)?(good|excellent)/

That would be contains.

Starts with:

/^this\\s+is\\s+(\\w\\s+)?(good|excellent)/

Ends with:

/this\\s+is\\s+(\\w\\s+)?(good|excellent)$/

Exact match:

/^this\\s+is\\s+(\\w\\s+)?(good|excellent)$/

Caveat:
BUT... such character-level regex matching is NOT guaranteed to be speedy 
and really should only be used for relatively small datasets.


-- Jack Krupansky

-Original Message- 
From: kobe.free.wo...@gmail.com

Sent: Saturday, May 18, 2013 6:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching for terms having embedded white spaces like word1 
word2


Thank you so very much Jack for your prompt reply. Your solution worked for
us.

I have another issue in querying fields having values of the sort
stringThis is good/stringstringThis is also good/stringstringThis
is excellent/string. I want to perform StartsWith as well as 'Contains
searches on this field. The field definition is as follow,

 fieldType name=cust_str class=solr.TextField
positionIncrementGap=100 sortMissingLast=true
 analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.TrimFilterFactory /
 /analyzer
 analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldType

Please suggest how to perform the above mentioned search.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-for-terms-having-embedded-white-spaces-like-word1-word2-tp4064170p4064355.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Best query method

2013-05-18 Thread Jack Krupansky
You'll have to decide whether cached or uncached filter queries work best 
for your particular application. If you can us cached filter queries, that's 
better, and then separating or factoring the filter query terms is better.


But if you have so much data or so little memory or such complex queries 
that caching is too expensive, you can go with uncached filter queries. You 
can then also assign a cost to each filter query to control the order they 
are executed:


Example: q=*:*fq={!cache=false cost=5}inStock:truefq={!frange l=1 u=4 
cache=false cost=50}sqrt(popularity)


See:
http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

But, start simple, with separate, cached, filter queries, and only get fancy 
if you have problems with query latency.


-- Jack Krupansky

-Original Message- 
From: J Mohamed Zahoor

Sent: Saturday, May 18, 2013 5:59 AM
To: solr-user@lucene.apache.org
Subject: Best query method

Hi

I am using solr 4.2.1.

My index has products from different stores with different attributes.

If i want to get the count of all products which belongs to store X which is 
coloured red and is in-stock…



My question is : Which way of querying is better in-terms of performance 
and cache usage.



1) q=*.*fq=(store:X) AND (colour:red) AND (in-stock:true)

2) q=store:Xfq=(colour:red) AND (in-stock:true)

3) q=store:Xfq=colour:redfq:in-stock:true

f there is any other option better than these three.. please add let me 
know..


i am  assuming that which ever filter eliminates more products… should come 
first (q, then list of fq's)




./zahoor
= 



Re: Adding filed in Schema.xml

2013-05-18 Thread Kamal Palei
Hi Alex,
Where I need to mention the types. Kindly tell me in detail.

I use Drupal framework. It has given a schema file. In that there are
already some long type fields, and these are actually shown by solr as part
of index.

Whatever long field I am adding it does not show part of index.

Best Regards
kamal


On Fri, May 17, 2013 at 7:47 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Do you have the types corresponding to those fields present?
 Specifically, long. You don't get any special type names out of the
 box, they all need to be present in types area.

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Fri, May 17, 2013 at 8:49 AM, Kamal Palei palei.ka...@gmail.com
 wrote:
  Hi All
  I am trying to add few fields in schema.xml file as below.
 
 field name=salary  type=long indexed=true stored=true /
 field name=experience  type=long indexed=true stored=true /
   *  field name=last_updated_date  type=tdate indexed=true
  stored=true default=NOW multiValued=false /
  *
 
 dynamicField name=rs_*  type=long  indexed=true  stored=true
  multiValued=false/
 dynamicField name=rd_*  type=tdate  indexed=true  stored=true
  multiValued=false/
 
  Only the last_updated_date  (the one in bold letters) getting added. Is
  there any syntax issue with other 4 entries. Kindly let me know.
 
  Thanks
  kamal



Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-18 Thread adityab
These numbers are really great. Would you mind sharing your h/w configuration
and JVM params

thanks 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrading-from-SOLR-3-5-to-4-2-1-Results-tp4064266p4064370.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-18 Thread Jason Hellman
Rishi,

Fantastic!  Thank you so very much for sharing the details.

Jason

On May 17, 2013, at 12:29 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 
 
 Hi All,
 
 Its Friday 3:00pm, warm  sunny outside and it was a good week. Figured I'd 
 share some good news.
 I work for AOL mail team and we use SOLR for our mail search backend. 
 We have been using it since pre-SOLR 1.4 and strong supporters of SOLR 
 community.
 We deal with millions indexes and billions of requests a day across our 
 complex.
 We finished full rollout of SOLR 4.2.1 into our production last week. 
 
 Some key highlights:
 - ~75% Reduction in Search response times
 - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90% Reduction 
 in errors
 - Garbage collection total stop reduction by over 50% moving application 
 throughput into the 99.8% - 99.9% range
 - ~15% reduction in CPU usage
 
 We did not tune our application moving from 3.5 to 4.2.1 nor update java.
 For the most part it was a binary upgrade, with patches for our special use 
 case.  
 
 Now going forward we are looking at prototyping SOLR Cloud for our search 
 system, upgrade java and tomcat, tune our application further. Lots of fun 
 stuff :)
 
 Have a great weekend everyone. 
 Thanks,
 
 Rishi. 
 
 
 
 



Re: Java heap space exception in 4.2.1

2013-05-18 Thread J Mohamed Zahoor

aah… was doing a facet on a double field which was having 6 decimal places…
No surprise that the lucene cache got full…

.z/ahoor

On 17-May-2013, at 11:56 PM, J Mohamed Zahoor zah...@indix.com wrote:

 Memory increase a lot with queries which have facets… 
 
 
 ./Zahoor
 
 
 On 17-May-2013, at 10:00 PM, Shawn Heisey s...@elyograg.org wrote:
 
 On 5/17/2013 1:17 AM, J Mohamed Zahoor wrote:
 I moved to 4.2.1 from 4.1 recently.. everything was working fine until i 
 added few more stats query..
 Now i am getting this error frequently that solr does not run even for 2 
 minutes continuously.
 All 5GB is getting used instantaneously in few queries...
 
 Someone on IRC ran into memory problems upgrading from 4.0 to 4.2.  It
 wasn't OOM errors, they were just using a lot more heap than they were
 before and running into constant full garbage collections.
 
 There is another message on this list about someone who upgraded from
 3.5 to 4.2 and is having memory troubles.
 
 The person on IRC made most of their fields unstored and reindexed,
 which fixed the problem for them.  They only needed a few fields stored.
 
 Because the IRC user was on 4.0, I originally thought it had something
 to do with compressed stored fields, but on this thread, they started
 with 4.1.  If that was the released 4.1.0 and not a SNAPSHOT version,
 then they had compressed stored fields before the upgrade.
 
 The user on IRC was not using termvectors or docvalues, which would be
 potential pain points unique to 4.2.
 
 I'm using 4.2.1 with no trouble in my setup, but I do have a heap that's
 considerably larger than I need.  There are no apparent memory leaks -
 it's been running for over a month with updates once a minute.  I've
 finally switched over from the 3.5.0 index to the new one, so for the
 last few days, it has been also taking our full query load.
 
 What could have changed between 4.1 and 4.2 to cause dramatically
 increased memory usage?
 
 From my /admin/system:
 
 date name=startTime2013-04-05T15:52:55.751Z/date
 
 Thanks,
 Shawn
 
 



Re: having trouble storing large text blob fields - returns binary address in search results

2013-05-18 Thread geeky2
hello

your comment made me think - so i decided to double check myself.

i opened up the schema in squirrel and made sure that the two columns in
question were actually of type TEXT in the schema - check

i went in to the db-config.xml and removed all references to
ClobTransformer, removed the cast directives from the fields as well as the
clob=true on the two fields - i pasted the db-config.xml below for
reference - check

i restarted jboss - thus restarting solr - check

i went in to the solr dataimport admin screen and did a clean import - check

after the import was complete - i queried a part that i knew would have one
of the clob fields - results are pasted below as well - you can see the
binary address in the field.


?xml version=1.0?
result name=response numFound=1 start=0
  doc
str name=accessoryIndicatorN/str
 *   str name=attributes[B@5b372219/str*
str name=availabilityStatusPIA/str
arr name=divProductTypeDesc
  strRefrigerators and Freezers/str
/arr
str name=divProductTypeId0046/str
str name=id12001892,0046,464/str
str name=itemModelDescVALVE, WATER/str
str name=itemModelNo12001892/str
str name=itemModelNoExactMatchStr12001892/str
int name=itemType1/int
str name=otcStockIndicatorY/str
int name=partCnt1/int
str name=partConditionN/str
arr name=plsBrandDesc
  str/
/arr
str name=plsBrandId464/str
str name=productIndicatorN/str
int name=rankNo13/int
float name=sellingPrice53.54/float
str name=sourceOrderNo464 /str
str name=subbedFlagY/str
  /doc
/result








document
entity transformer=TemplateTransformer name=core1-parts
query=select 
summ.*, 
1 as item_type, 
1 as part_cnt, 
'' as brand, 
mst.acy_prt_fl,
mst.dil_tx,
mst.hzd_mtl_typ_cd,
mst.otc_cre_stk_fl,
mst.prd_fl,
mst.prt_cmt_tx,
mst.prt_cnd_cd,
mst.prt_inc_qt,
mst.prt_made_by,
mst.sug_qt,
att.attr_val,
rsr.rsr_val,
case when sub.orb_itm_id is null then 'N' else 'Y' end as
subbed_flag
from 
prtxtps_prt_summ as summ
left outer join prtxtpm_prt_mast as mst on mst.orb_itm_id =
summ.orb_itm_id and mst.prd_gro_id = summ.prd_gro_id and mst.spp_id =
summ.spp_id
left outer join tmpxtpa_prt_attr as att on att.orb_itm_id =
summ.orb_itm_id and att.prd_gro_id = summ.prd_gro_id and att.spp_id =
summ.spp_id 
left outer join tmpxtpr_prt_rsr as rsr on rsr.orb_itm_id =
summ.orb_itm_id and rsr.prd_gro_id = summ.prd_gro_id and rsr.spp_id =
summ.spp_id 
left outer join tmpxtps_prt_sub as sub on sub.orb_itm_id =
summ.orb_itm_id and sub.prd_gro_id = summ.prd_gro_id and sub.spp_id =
summ.spp_id
where 
summ.spp_id = '464' 

field column =id name=id 
template=${core1-parts.orb_itm_id},${core1-parts.prd_gro_id},${core1-parts.spp_id}/
field column=orb_itm_id name=itemModelNo/ 
 
field column=prd_gro_id
name=divProductTypeId/ 
field column=ds_tx 
name=itemModelDesc/ 
field column=spp_id name=plsBrandId/ 
field column=rnk_no name=rankNo/ 
field column=item_type  name=itemType/ 
field column=brand  name=plsBrandDesc/ 
field column=prd_gro_ds
name=divProductTypeDesc/ 
field column=part_cnt   name=partCnt/ 
field column=avail 
name=availabilityStatus/ 
field column=price  name=sellingPrice/ 
field column=prt_son   
name=sourceOrderNo/ 
field column=prt_src_cd name=sourceIdCode/ 
field column=rte_cd
name=sourceRouteCode/ 

field column=acy_prt_fl
name=accessoryIndicator/ 
field column=dil_tx name=disclosure/ 
field column=hzd_mtl_typ_cd
name=hazardousMaterialCode/ 
field column=otc_cre_stk_fl
name=otcStockIndicator/ 
field column=prd_fl
name=productIndicator/ 
field column=prt_cmt_tx name=comment/ 
field column=prt_cnd_cd
name=partCondition/ 
field column=prt_inc_qt name=qtyIncluded/ 
field column=prt_made_byname=madeBy/ 
field column=sug_qt name=suggestedQty/ 

field column=attr_val   name=attributes/ 
field column=rsr_valname=restrictions/ 

field column=subbed_flag

Wide vs Tall document in Solr 4.2.1

2013-05-18 Thread adityab
Hi, 
We recently decided to move from Solr version 3.5 to 4.2.1. The transition
seam to be smooth from development point but i see some intermediate issues
with our cluster. 
Some information We use the classic Master/Slave model (have plans to move
to Cloud v4.3)

#documents 300K and have around 150 fields (including dynamic) 
index size 10GB

Most of the fields are multiValued (type String) and the size of array in
those vary from 5 to 50K. So our 30% of popular documents are tall. Not all
information in this multivalued fields is required so at application layer
we loop and eliminate the unwanted. These are stored is such fashion because
of the 1 to many mapping in SQL DB.

Issues that we observed is high CPU and Memory utilization while retrieving
these document with large multivalued fields.
So my questions is if its possible to make this tall document to a wide
document so only required information is fetched. Is this a better approach
to look for? Any other thoughts are welcomed.

thanks
Aditya 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wide-vs-Tall-document-in-Solr-4-2-1-tp4064409.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Zookeeper Ensemble Startup Parameters For SolrCloud?

2013-05-18 Thread Furkan KAMACI
I have read that about zookeeper:

Zookeeper servers have an active connections limit, which by default is
30. Do you define it higher than 30 for Solr?

2013/5/17 vsilgalis vsilga...@gmail.com

 As an example, I have 9 SOLR nodes (3 clusters of 3) using different
 versions
 of SOLR (4.1, 4.1, and 4.2.1), utilizing the same zookeeper ensemble (3
 servers), using chroot for the different configs across clusters.

 My zookeeper servers are just VMs, dual-core with 1GB of RAM and are only
 used for SOLRCloud

 JVM settings for zookeeper for heap size are start 256MB and max heap size
 of 512MB or: -Xms256m -Xmx512m

 I have never seen it use more than the specified start heap size of 256MB.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Zookeeper-Ensemble-Startup-Parameters-For-SolrCloud-tp4063905p4064279.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wide vs Tall document in Solr 4.2.1

2013-05-18 Thread Chris Hostetter

: We recently decided to move from Solr version 3.5 to 4.2.1. The transition
...
: Most of the fields are multiValued (type String) and the size of array in
: those vary from 5 to 50K. So our 30% of popular documents are tall. Not all
...
: Issues that we observed is high CPU and Memory utilization while retrieving
: these document with large multivalued fields.

Are you certain you ar using 4.2.1 and not 4.2 ?

There was a particularly bad bug related to enableLazyFieldLoading 
affecting Solr 4.0, 4.1, and 4.2, but it should *not* affect 4.2.1...

https://issues.apache.org/jira/browse/SOLR-4589

If you are seeing slow response times and heavy CPU spikes, it would help 
to know if you could take some thread dumps during those CPU spikes to see 
what it chewing up CPU ... you may just be seeing the effects of stored 
field compression -- which uses more CPU on stored field retrieval to 
decompress the blocks of field values, but allows the index size to be 
much smaller so more things can be cached in RAM.

: So my questions is if its possible to make this tall document to a wide
: document so only required information is fetched. Is this a better 
: approach to look for? Any other thoughts are welcomed.

I don't really understand what you mean by tall vs wide (i thought i 
understood what you ment by tall initially, but i don't understand what 
you mean by make the tall document side

just in case it's not obvious: if there are stored fields you don't want 
back in the response, leave them out of your fl param and only request 
the fields you actaully want.


-Hoss


Re: Adding filed in Schema.xml

2013-05-18 Thread Kamal Palei
Hi Alex
I just saw in* types *area, long is already defined as

* fieldType name=long class=solr.TrieLongField precisionStep=0
omitNorms=true positionIncrementGap=0/
*
Hence I hope, I should be able to declare a long type index in* fields *area
as shown below.

  field name=salary  type=long indexed=true stored=true /
  field name=experience  type=long indexed=true stored=true /

Not sure, why it is not taking effect.

Best Regards
Kamal




On Sat, May 18, 2013 at 6:23 PM, Kamal Palei palei.ka...@gmail.com wrote:

 Hi Alex,
 Where I need to mention the types. Kindly tell me in detail.

 I use Drupal framework. It has given a schema file. In that there are
 already some long type fields, and these are actually shown by solr as part
 of index.

 Whatever long field I am adding it does not show part of index.

 Best Regards
 kamal


 On Fri, May 17, 2013 at 7:47 PM, Alexandre Rafalovitch arafa...@gmail.com
  wrote:

 Do you have the types corresponding to those fields present?
 Specifically, long. You don't get any special type names out of the
 box, they all need to be present in types area.

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Fri, May 17, 2013 at 8:49 AM, Kamal Palei palei.ka...@gmail.com
 wrote:
  Hi All
  I am trying to add few fields in schema.xml file as below.
 
 field name=salary  type=long indexed=true stored=true /
 field name=experience  type=long indexed=true stored=true /
   *  field name=last_updated_date  type=tdate indexed=true
  stored=true default=NOW multiValued=false /
  *
 
 dynamicField name=rs_*  type=long  indexed=true  stored=true
  multiValued=false/
 dynamicField name=rd_*  type=tdate  indexed=true
  stored=true
  multiValued=false/
 
  Only the last_updated_date  (the one in bold letters) getting added.
 Is
  there any syntax issue with other 4 entries. Kindly let me know.
 
  Thanks
  kamal





Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-18 Thread Shalin Shekhar Mangar
Awesome news Rishi! Looking forward to your SolrCloud updates.


On Sat, May 18, 2013 at 12:59 AM, Rishi Easwaran rishi.easwa...@aol.comwrote:



 Hi All,

 Its Friday 3:00pm, warm  sunny outside and it was a good week. Figured
 I'd share some good news.
 I work for AOL mail team and we use SOLR for our mail search backend.
 We have been using it since pre-SOLR 1.4 and strong supporters of SOLR
 community.
 We deal with millions indexes and billions of requests a day across our
 complex.
 We finished full rollout of SOLR 4.2.1 into our production last week.

 Some key highlights:
 - ~75% Reduction in Search response times
 - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90%
 Reduction in errors
 - Garbage collection total stop reduction by over 50% moving application
 throughput into the 99.8% - 99.9% range
 - ~15% reduction in CPU usage

 We did not tune our application moving from 3.5 to 4.2.1 nor update java.
 For the most part it was a binary upgrade, with patches for our special
 use case.

 Now going forward we are looking at prototyping SOLR Cloud for our search
 system, upgrade java and tomcat, tune our application further. Lots of fun
 stuff :)

 Have a great weekend everyone.
 Thanks,

 Rishi.







-- 
Regards,
Shalin Shekhar Mangar.


Re: Adding filed in Schema.xml

2013-05-18 Thread Gora Mohanty
On 19 May 2013 08:36, Kamal Palei palei.ka...@gmail.com wrote:
 Hi Alex
 I just saw in* types *area, long is already defined as

 * fieldType name=long class=solr.TrieLongField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 *
 Hence I hope, I should be able to declare a long type index in* fields *area
 as shown below.

   field name=salary  type=long indexed=true stored=true /
   field name=experience  type=long indexed=true stored=true /

Yes, this should be fine.

 Not sure, why it is not taking effect.

What do you mean by not taking effect? You do not seem to have
made this clear anywhere in the thread.

Besides adding the fields to Solr's schema.xml, you have to make
sure that field values are picked up, and indexed properly into Solr.
How are you indexing? Have you reindexed after adding the fields?
Are you getting any errors in the logs after the indexing.

Regards,
Gora