Re: Solr commit taking too long

2013-01-17 Thread Upayavira
Some questions:

What version of Solr?
Has the number of documents in your index changed in the meantime? 
How many before, how many now?
How does maxdocs compare to numdocs? 
Has this system ever been upgraded from an older Solr?
Is it committing that is taking that long, or opening a searcher one the
commit is done?

Maybe answers to these might help unpick your issue.

Upayavira

On Thu, Jan 17, 2013, at 06:22 AM, Cool Techi wrote:
 Hi,
 
 We have an index of approximately 400GB in size, indexing 5000 documents
 was taking 20 seconds. But lately, the indexing is taking very long,
 committing the same amount of document is taking 5-20 mins. 
 
 On checking the logs I can see that their a frequent merges happening,
 which I am guessing is the reason for this, how can this be improved. My
 configurations are given below,
 
 useCompoundFilefalse/useCompoundFile
 mergeFactor30/mergeFactor
 ramBufferSizeMB64/ramBufferSizeMB
 
 regards,
 Ayush
 


Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Uwe Reh

Hi Mark,

one entry in my long list of self made problems is:
Done the commit before the ConcurrentUpdateSolrServer was finished.

Since the ConcurrentUpdateSolrServer is asynchronous, it's very easy to 
create a race conditions. Make sure that your program is waiting () 
before it's doing the commit.

if (solrserver instanceof ConcurrentUpdateSolrServer) {
   ((ConcurrentUpdateSolrServer) solrserver).blockUntilFinished();
}


Uwe



URL encoding problems

2013-01-17 Thread Bruno Dusausoy

Hi,

I have some problems related to URL encoding.
I'm using Solr 3.6.1 on a Windows (32 bit) system.
Apache Tomcat is version 6.0.36.
I'm accessing Solr through solrj-3.3.0.

When using the Solr admin and specifying my request, the URL looks like 
this (${SOLR} is there for the sake of brevity) :

${SOLR}/select?q=rapporteur_name%3A%28John+%2BSmith+%2B%5C%28FOO%5C%29%29

But when my app launching the query, the URL looks like this :
${SOLR}/select?q=rapporteur_name%3A%28John%5C+Smith%5C+%5C%28FOO%5C%29%29

My decoded query, as entered in the admin interface, is :
rapporteur_name:(John +Smith +\(FOO\))

Both request return results, but only the one returns the correct ones.

The code that escapes the query is :

SolrQuery query = new SolrQuery();
query.setQuery(rapporteur_name:( + ClientUtils.escapeQueryChars(John 
Smith (FOO)) + ));


I don't know if it's the right way to encode the query.

Any ideas or directions ?

Regards.
--
Bruno Dusausoy
Software Engineer
YP5 Software
--
Pensez environnement : limitez l'impression de ce mail.
Please don't print this e-mail unless you really need to.


Re: Response time in client was much longer than QTime in tomcat

2013-01-17 Thread Mikhail Khludnev
Hello,

QTime counts only searching and filtering, but not writing response, which
includes retrieving the stored fields (fl=...). So, it's quite reasonable.


On Thu, Jan 17, 2013 at 7:09 AM, 张浓飞 zhangnong...@vancl.cn wrote:

  I have a solr website with about 500 docs ( 30 fileds defined in schema
 ), and a c# client on the same machine which would sent http get request to
 that solr website.

 These logs were recorded by my c# client:

 ** **

 01-16 23:54:49,301 [107] INFO LogHelper - requst time too long: 1054, solr
 time: 1003

 01-16 23:54:49,847 [63] INFO LogHelper - requst time too long: 1068, solr
 time: 1021

 01-16 23:57:17,813 [108] INFO LogHelper - requst time too long: 1051, solr
 time: 1027

 01-16 23:57:18,313 [111] INFO LogHelper - requst time too long: 1031, solr
 time: 1007

 and so on…

 ** **

 You can see , the query time from solr were so long and every similar
 (between 1000ms to 1050ms). On the same time, the corresponding logs in
 tomcat:

 ** **

 2013-1-16 23:54:49 org.apache.solr.core.SolrCore execute

 Info: [suit1] webapp=/vanclsearchV2 path=/select/
 params={fl=id,typeid,createtime,vprice,sprice,price,totalassesscount,totalsalescount,productcode,productname,stylecode,tag,vpricesku,spricesku,pricesku,userrate,assesscount,lstphotos,mainphotos,salesflag,isduanma,detailsalescount,productplusstyleinfosort=createtime+descstart=0q=*:*wt=jsonfq=ancestorsid:(28976+OR+28978)fq=typeid:(1)rows=30}
 hits=43 status=0 QTime=0 

 2013-1-16 23:54:49 org.apache.solr.core.SolrCore execute

 Info: [suit1] webapp=/vanclsearchV2 path=/select/
 params={fl=id,typeid,createtime,vprice,sprice,price,totalassesscount,totalsalescount,productcode,productname,stylecode,tag,vpricesku,spricesku,pricesku,userrate,assesscount,lstphotos,mainphotos,salesflag,isduanma,detailsalescount,productplusstyleinfosort=createtime+descstart=0q=*:*wt=jsonfq=ancestorsid:(28976+OR+28978)fq=typeid:(1)rows=30}
 hits=43 status=0 QTime=0

 2013-1-16 23:57:17 org.apache.solr.core.SolrCore execute

 Info: [suit1] webapp=/vanclsearchV2 path=/select/
 params={fl=id,typeid,createtime,vprice,sprice,price,totalassesscount,totalsalescount,productcode,productname,stylecode,tag,vpricesku,spricesku,pricesku,userrate,assesscount,lstphotos,mainphotos,salesflag,isduanma,detailsalescount,productplusstyleinfosort=createtime+descstart=0q=*:*wt=jsonfq=ancestorsid:(27547+OR+27614)rows=30}
 hits=9 status=0 QTime=0 

 2013-1-16 23:57:18 org.apache.solr.core.SolrCore execute

 Info: [suit1] webapp=/vanclsearchV2 path=/select/
 params={fl=id,typeid,createtime,vprice,sprice,price,totalassesscount,totalsalescount,productcode,productname,stylecode,tag,vpricesku,spricesku,pricesku,userrate,assesscount,lstphotos,mainphotos,salesflag,isduanma,detailsalescount,productplusstyleinfosort=createtime+descstart=0q=*:*wt=jsonfq=ancestorsid:(27547+OR+27614)rows=30}
 hits=9 status=0 QTime=0

 ** **

 Every strange, all the QTime were zero! Can anyone explain this
 circumstance, and how to solve the problem?

 ** **


 --
 

 [image: 说明: 说明: 说明: 说明: image001]

 Domi.N.Zhang | Dev Center

 Email : zhangnong...@vancl.cn

 Tel:86-028-65528402

 I’m the coming days…

 ** **




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


how to get abortOnConfigurationError=false working

2013-01-17 Thread snake
I will explain the scenario just to avoid all the potential replies asking
why.

We run coldFusion servers (windows) which has SOLR built in (running on
Jetty).
A customer creates a collection which is stored within their own webspace,
they only have read/write access to their own webspace so cannot put them
anywhere else.

the default value for abortOnConfigurationError is true.
This causes endless problems when customers make changes to their websites
or cancel their hosting, the collection gets deleted, and SOLR then crashes
because it cannot find the config files for that collection.
We then have to find out which collection is causing the problem, and
manually remove its entry from solr.xml

Obviously this is a PITA.

In the error output it says.

If you want solr to continue after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError
in solr.xml

I have tried this, but it has no effect.
I have also tried putting it in all the solrconfig.xml files
I tried this
abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError
and this
abortOnConfigurationErrorfalse/abortOnConfigurationError

neither had any effect.

How do you get this to work ?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggestion that preserve original phrase case

2013-01-17 Thread Erick Erickson
You could write a custom Filter (or perhaps Tokenizer), but I usually
just do it on the input side before things get sent to Solr.

I don't think PatternReplaceCharFilterFactory will help, you could
easily turn the input into original:original, but then you'd need to
write a custom filter that normalized the left-hand-side but not the
right-hand-side

Best
Erick

On Tue, Jan 15, 2013 at 11:27 AM, Selvam s.selvams...@gmail.com wrote:
 Thanks Erick, can you tell me how to do the appending
 (lowercaseversion:LowerCaseVersion) before indexing. I tried pattern
 factory filters, but I could not get it right.


 On Sun, Jan 13, 2013 at 8:49 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 One way I've seen this done is to index pairs like
 lowercaseversion:LowerCaseVersion. You can't push this whole thing through
 your field as defined since it'll all be lowercased, you have to produce
 the left hand side of the above yourself and just use KeywordTokenizer
 without LowercaseFilter.

 Then, your application displays the right-hand-side of the returned token.

 Simple solution, not very elegant, but sometimes the easiest...

 Best
 Erick


 On Fri, Jan 11, 2013 at 1:30 AM, Selvam s.selvams...@gmail.com wrote:

  Hi*,
 
  *
  I have been trying to figure out a way for case insensitive suggestion
 but
  which should return original phrase as result.* *I am using* *solr 3.5*
 
  *
  *For eg:
 
  *
  If I index 'Hello world' and search  for 'hello' it needs to return
 *'Hello
  world'* not *'hello world'. *My configurations are as follows,*
  *
  *
  New field type:*
  fieldType class=solr.TextField name=text_auto
analyzer
 tokenizer class=solr.KeywordTokenizerFactory /
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
 
  *Field values*:
 field name=label type=text indexed=true stored=true
  termVectors=true omitNorms=true/
 field name=label_autocomplete type=text_auto indexed=true
  stored=true multiValued=false/
 copyField source=label dest=label_autocomplete /
 
  *Spellcheck Component*:
searchComponent name=suggest class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypetext_auto/str
  lst name=spellchecker
   str name=namesuggest/str
   str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str
  name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
  str name=buildOnOptimizetrue/str
  str name=buildOnCommittrue/str
  str name=fieldlabel_autocomplete/str
/lst
  /searchComponent
 
 
  Kindly share your suggestions to implement this behavior.
 
  --
  Regards,
  Selvam
  KnackForge http://knackforge.com
  Acquia Service Partner
  No. 1, 12th Line, K.K. Road, Venkatapuram,
  Ambattur, Chennai,
  Tamil Nadu, India.
  PIN - 600 053.
 




 --
 Regards,
 Selvam
 KnackForge http://knackforge.com
 Acquia Service Partner
 No. 1, 12th Line, K.K. Road, Venkatapuram,
 Ambattur, Chennai,
 Tamil Nadu, India.
 PIN - 600 053.


Re: SOlr 3.5 and sharding

2013-01-17 Thread Erick Erickson
You're still confusing shards (or at least mixing up the terminology)
with simple replication. Shards are when you split up the index into
several sub indexes and configure the sub-indexes to know about each
other. Say you have 1M docs in 2 shards. 500K of them would go on one
shard and 500K on the other. But logically you have a single index of
1M docs. So the two shards have to know about each other and when you
send a request to one of them, it automatically queries the other (as
well as itself), collects the response and combines them, returning
the top N to the requester.

This is totally different from replication. In replication
(master/slave), each node has all 1M documents. Each node can work
totally in isolation. An incoming request is handled by the slave
without contacting any other node.

If you're copying around indexes AND configuring them as though they
were shards, each request will be distributed to all shards and the
results collated, giving you the same doc repeatedly in your result
set.

If you have no access to the indexing code, you really can't go to a
sharded setup.

Polling is when the slaves periodically ask the master has anything
changed? If so then the slave pulls down the changes. The polling
interval is configured in solrconfig.xml _on the slave_. So let's say
you index docs to the master. For some interval, until the slaves poll
the master and get an updated index, the number of searchable docs on
the master will be different than for the slaves. Additionally, you
may have the issue of the polling intervals for the slaves being
offset from one another, so for some brief interval the counts on the
slaves may be different as well.

Best
Erick

On Tue, Jan 15, 2013 at 10:18 AM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedanalytics.com wrote:
 Ok I see what Erick`s meant now.. Thanks.

 The original index I`m working on contains about 120k documents. Since I have 
 no access to the code that pushes documents into the index, I made four 
 copies of the same index.

 The master node contains no data at all, it simply use the data available in 
 its four shards. Knowing that I have 1000 documents matching the keyword 
 java on each shard I was expecting to receive 4000 documents out of my 
 sharded setup. There are only a few documents that are not accounted for (The 
 result count is about 3996 which is pretty close but not accurate).

 Right now, the index is static so there is no need for any replication so the 
 polling interval has no effect.
 Later this week, I will configure the replication and have the indexation 
 modified to  distribute the documents to each shard using a simple ID modulo 
 4 rule.

 Were my expectations wrong about the number  of documents?

 -Original Message-
 From: Upayavira [mailto:u...@odoko.co.uk]
 Sent: January-15-13 9:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOlr 3.5 and sharding

 He was referring to master/slave setup, where a slave will poll the master 
 periodically asking for index updates. That frequency is configured in 
 solrconfig.xml on the slave.

 So, you are saying that you have, say 1m documents in your master index.
 You then copy your index to four other boxes. At that point you have 1m 
 documents on each of those four. Eventually, you'll delete some docs, so'd 
 you have 250k on each. You're wondering, before the deletes, you're not 
 seeing 1m docs on each of your instances.

 Or are you wondering why you're not seeing 1m docs when you do a distributed 
 query across all for of these boxes?

 Is that correct?

 Upayavira

 On Tue, Jan 15, 2013, at 02:11 PM, Jean-Sebastien Vachon wrote:
 Hi Erick,

 Thanks for your comments but I am migrating an existing index (single
 instance) to a sharded setup and currently I have no access to the
 code involved in the indexation process. That`s why I made a simple
 copy of the index on each shards.

 In the end, the data will be distributed among all shards.

 I was just curious to know why I had not the expected number of
 documents with my four shards.

 Can you elaborate on  this polling interval thing? I am pretty sure
 I never eared about this...

 Regards

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: January-15-13 8:00 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOlr 3.5 and sharding

 You're confusing shards and slaves here. Shards are splitting a
 logical index amongst N machines, where each machine contains a
 portion of the index. In that setup, you have to configure the slaves
 to know about the other shards, and the incoming query has to be
 distributed amongst all the shards to find all the docs.

 In your case, since you're really replicating (rather than sharding),
 you only have to query _one_ slave, the query doesn't need to be distributed.

 So pull all the sharding stuff out of your config files, put a load
 balancer in front of your slaves and only send the request to one of
 them would be the 

Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks
I want to configure Field Collapsing, but my target field is multi-valued
(e.g. the field I want to group on has a variable # of entries per document,
1-N entries).

I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
grouping doesn't support multi-valued fields yet.

Anything in the works on that front by chance?  Any common work-arounds?




Re: Large data importing getting rollback with solr

2013-01-17 Thread Otis Gospodnetic
Hi,

It looks like this is the cause:
JBC0016E: Remote call failed
(return code=-2,220). SDK9019E: internal errorSDK9019X:

Interestingly, Google gives just 1 hit for the above as query - your post.
But it seems you should look up what the above codes mean first...

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 2:43 AM, ashimbose ashimb...@gmail.com wrote:

 I am trying to index large data (not rich document) about 5GB, but Its not
 getting index. In case of small data it's perfectly indexing.For Large data
 import XML response..  00  data-config.xml
 full-import  busy  A command is still running...  0:9:12.738169
 18107902013-01-17 12:50:13Indexing failed. Rolled back all
 changes.2013-01-17 12:50:30This response format is experimental.
  It
 is likely to change in the future.BUT for small data index XML response
 perfectly OK as below...  00  data-config.xml
 full-import  busy  A command is still running...  0:0:12.43611
 3820902013-01-17 12:56:57Indexing completed. Added/Updated:
 38209 documents. Deleted 0 documents.This response format is
 experimental.  It is likely to change in the future.For Large data error
 log
 response is as below...Its getting RollbackINFO: Time taken for
 getConnection(): 1343Jan 17, 2013 12:36:21 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBCODE_HAZ_BRA with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:23 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1341Jan 17, 2013 12:36:23 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBCODE_HAZ_TBL with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:24 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1357Jan 17, 2013 12:36:24 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBCODE_LANG with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:26 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1392Jan 17, 2013 12:36:26 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBCODE_TBL with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1535Jan 17, 2013 12:36:41 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBCODE_TBL_ARG with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:43 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1467Jan 17, 2013 12:36:43 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBCODE_TBL_BRA with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1373Jan 17, 2013 12:36:44 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBCOMP_TMP_MC with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:45 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1404Jan 17, 2013 12:36:45 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBFUNCTION_LNG with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:47 PM org.apache.solr.core.SolrCore executeINFO: [core1]
 webapp=/solr path=/dataimport params={command=full-import} status=0
 QTime=0Jan 17, 2013 12:36:47 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Time taken
 for
 getConnection(): 1357Jan 17, 2013 12:36:47 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOBFUNCTION_TBL with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:48 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1310Jan 17, 2013 12:36:48 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
 connection for entity PS_JOB_APPROVALS with URL:
 jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
 2013 12:36:50 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 callINFO: Time taken for getConnection(): 1342Jan 17, 2013 12:36:50 PM
 

Re: Solr commit taking too long

2013-01-17 Thread Otis Gospodnetic
Hi,

That's a juicy index.  Is this on a single server?  Have you considered
sharding it and thus spreading the indexing work over multiple servers,
disks, etc.?
You could increase ramBufferSizeMB, which will help a bit with indexing
speed, but not with actual merging.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 1:22 AM, Cool Techi cooltec...@outlook.com wrote:

 Hi,

 We have an index of approximately 400GB in size, indexing 5000 documents
 was taking 20 seconds. But lately, the indexing is taking very long,
 committing the same amount of document is taking 5-20 mins.

 On checking the logs I can see that their a frequent merges happening,
 which I am guessing is the reason for this, how can this be improved. My
 configurations are given below,

 useCompoundFilefalse/useCompoundFile
 mergeFactor30/mergeFactor
 ramBufferSizeMB64/ramBufferSizeMB

 regards,
 Ayush



Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread snake
here is what it says in the SOLR info page

Solr Specification Version: 1.4.0.2009.11.18.10.19.05
 Solr Implementation Version: 1.4.1-dev exported - kvinu - 2009-11-18
10:19:05
 Lucene Specification Version: 2.9.1
 Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25



On Thu, Jan 17, 2013 at 1:33 PM, Alexandre Rafalovitch [via Lucene] 
ml-node+s472066n4034156...@n3.nabble.com wrote:

 Which version of Solr is it for?

 I had a situation on Solr4, where I basically did not have a directory
 that
 solr.xml was pointing at for one of the cores. And Solr continued working
 but the Admin interface was showing big red banners about configuration
 problem.

 So, maybe it was a bug that was fixed for Solr 4?

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Thu, Jan 17, 2013 at 8:03 AM, snake [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4034156i=0
 wrote:

  I will explain the scenario just to avoid all the potential replies
 asking
  why.
 
  We run coldFusion servers (windows) which has SOLR built in (running on
  Jetty).
  A customer creates a collection which is stored within their own
 webspace,
  they only have read/write access to their own webspace so cannot put
 them
  anywhere else.
 
  the default value for abortOnConfigurationError is true.
  This causes endless problems when customers make changes to their
 websites
  or cancel their hosting, the collection gets deleted, and SOLR then
 crashes
  because it cannot find the config files for that collection.
  We then have to find out which collection is causing the problem, and
  manually remove its entry from solr.xml
 
  Obviously this is a PITA.
 
  In the error output it says.
 
  If you want solr to continue after configuration errors, change:
  abortOnConfigurationErrorfalse/abortOnConfigurationError
  in solr.xml
 
  I have tried this, but it has no effect.
  I have also tried putting it in all the solrconfig.xml files
  I tried this
 
 
 abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError

  and this
  abortOnConfigurationErrorfalse/abortOnConfigurationError
 
  neither had any effect.
 
  How do you get this to work ?
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034156.html
  To unsubscribe from how to get abortOnConfigurationError=false working, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034149code=cnVzc0BtaWNoYWVscy5tZS51a3w0MDM0MTQ5fDEwMDg4NTg5MzM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 

--

Russ Michaels

www.bluethunderinternet.com  : Business hosting services  solutions
www.cfmldeveloper.com: ColdFusion developer community
www.michaels.me.uk   : my blog
www.cfsearch.com : ColdFusion search engine
**
*skype me* : russmichaels




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034178.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URL encoding problems

2013-01-17 Thread Jack Park
Similar thoughts: I used unit tests to explore that issue with SolrJ,
originally encoding with ClientUtils; The returned results had |
many places in the text, with no clear way to un-encode. I eventually
ran some tests with no encoding at all, including strings like
taghello  goodbye/tag; such strings were served and fetched
without errors. In queries at the admin console, they show up in the
JSON results correctly.  What's left? I share the confusion about what
is really going on.

Jack

On Thu, Jan 17, 2013 at 2:44 AM, Bruno Dusausoy bdusau...@yp5.be wrote:
 Hi,

 I have some problems related to URL encoding.
 I'm using Solr 3.6.1 on a Windows (32 bit) system.
 Apache Tomcat is version 6.0.36.
 I'm accessing Solr through solrj-3.3.0.

 When using the Solr admin and specifying my request, the URL looks like this
 (${SOLR} is there for the sake of brevity) :
 ${SOLR}/select?q=rapporteur_name%3A%28John+%2BSmith+%2B%5C%28FOO%5C%29%29

 But when my app launching the query, the URL looks like this :
 ${SOLR}/select?q=rapporteur_name%3A%28John%5C+Smith%5C+%5C%28FOO%5C%29%29

 My decoded query, as entered in the admin interface, is :
 rapporteur_name:(John +Smith +\(FOO\))

 Both request return results, but only the one returns the correct ones.

 The code that escapes the query is :

 SolrQuery query = new SolrQuery();
 query.setQuery(rapporteur_name:( + ClientUtils.escapeQueryChars(John
 Smith (FOO)) + ));

 I don't know if it's the right way to encode the query.

 Any ideas or directions ?

 Regards.
 --
 Bruno Dusausoy
 Software Engineer
 YP5 Software
 --
 Pensez environnement : limitez l'impression de ce mail.
 Please don't print this e-mail unless you really need to.


Re: group.ngroups behavior in response

2013-01-17 Thread denl0
There's a parameter to enable that. :D

In solrJ

solrQuery.setParam(group.ngroups, true);

http://wiki.apache.org/solr/FieldCollapsing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/group-ngroups-behavior-in-response-tp4033924p4034187.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Shawn Heisey

On 1/17/2013 3:32 AM, Uwe Reh wrote:

one entry in my long list of self made problems is:
Done the commit before the ConcurrentUpdateSolrServer was finished.

Since the ConcurrentUpdateSolrServer is asynchronous, it's very easy to
create a race conditions. Make sure that your program is waiting ()
before it's doing the commit.

if (solrserver instanceof ConcurrentUpdateSolrServer) {
   ((ConcurrentUpdateSolrServer) solrserver).blockUntilFinished();
}


If you are using the same ConcurrentUpdateSolrServer object for all 
update interaction with Solr (including commits) and you still have to 
do the blockUntilFinished() in your own code before you issue an 
explicit commit, that sounds like a bug, and you should put all the 
details in a Jira issue.


The following code is part of the request method in CUSS:

// this happens for commit...
if (req.getDocuments() == null || req.getDocuments().isEmpty()) {
  blockUntilFinished();
  return server.request(request);
}

This means that if you use the same CUSS object for update interaction 
with Solr (including commits), the object will do the waiting for you 
when you make an explicit commit() call.  If you issue a commit with a 
different object (either another instance of CUSS or HttpSolrServer), 
then this won't work and you'd have to handle it yourself.


For error handling, I filed SOLR-3284 and provided a patch.  It hasn't 
been committed, I think mostly because it doesn't give any specific 
information about what failed.  I have an idea for how to improve the 
patch to address committer concerns, but until I have some time to 
actually look at it, I won't know if it's viable.  When I have a moment, 
I'll update the issue with details about my idea.


Thanks,
Shawn



Re: Search strategy - improving search quality for short search terms such as doll

2013-01-17 Thread Otis Gospodnetic
Hi David,

I think this is where search analytics can help.  If your intuition is
right and people who search for doll are not actually searching for doll
face... CD, then search analytics will confirm that.  This analytics I'm
talking about involves search and click tracking and analysis.  Once you
have this data you can play with boosting queries, altering queries, etc.
based on this historical knowledge about what people who searched for X
tend to do after the search.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Jan 16, 2013 at 9:51 PM, David Parks davidpark...@yahoo.com wrote:

 My issue is more that the search term doll shows up in both documents on
 CDs
 as well as documents about toys. But I have 10 CD documents for every toy
 document, so my searches for doll tend to show the CDs most prominently.
 But that's not the way a user thinks. If they want the CD documents they'll
 search for doll face, or doll face song, more specific queries (which
 work fine), but if they want the toy they might just search for doll.

 If I run the searches doll and doll song on google image search you'll
 clearly see that google has solved this problem perfectly. doll returns
 toy dolls, and doll song returns music and anime results.

 I'm striving for this type of result.



 -Original Message-
 From: Amit Jha [mailto:shanuu@gmail.com]
 Sent: Wednesday, January 16, 2013 11:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Search strategy - improving search quality for short search
 terms such as doll

 Its all about the data data set, here I mean index. If you have documents
 containing toy and doll it will return that in result set.

 What I understood that you are talking about the context of the query. For
 example if you search books on MK Gandhi and books by MK Gandhi both
 queries have different context.

 Context based search at some level achieved by natural language processing.
 This one you can look at for better search.

 Look for solr wiki  mailing list would be great source of learning.


 Rgds
 AJ

 On 16-Jan-2013, at 15:10, David Parks davidpark...@yahoo.com wrote:

  I'm a beginner-intermediate solr admin, I've set up the basics for our
  application and it runs well.
 
 
 
  Now it's time for me to dig in and start tuning and improving queries.
 
 
 
  My next target is searches on simple terms such as doll which, in
  google, would return documents about, well, toy dolls, because
  that's the most common usage of the simple term doll. But in my
  index it predominantly returns documents about CDs with the song Doll
  Face, and My baby doll in them.
 
 
 
  I'm not directly asking how to solve this as much as I'm asking what
  direction I should be looking in to learn what I need to know to
  tackle the general issue myself.
 
 
 
  Left on my own I would start looking at categorizing the CD's into a
  facet called music, reasonably doable in my dataset. Then I need to
  reduce the boost-value of the entire facet/category of music unless
  certain pre-defined query terms exist, such as [music, cd, song,
  listen, dvd, analyze actual user queries to come up with a more
 exhaustive list, etc.].
 
 
 
  I don't yet know how to do all of this, but after a couple more good
  books I should be dangerous.
 
 
 
  So the question to this list:
 
 
 
  -  Am I on the right track here?  If not, can you point me in a
  direction to go?
 
 
 
 
 




Re: Large data importing getting rollback with solr

2013-01-17 Thread Shawn Heisey

ashimbose,

It is possible that this is happening because Solr reaches a point where 
it is doing so many simultaneous merges that ongoing indexing is stopped 
until a huge merge finishes.  This causes the JDBC driver to time out 
and disconnect, and there is no viable generic way to recover from that 
problem.


I used to run into this with large MySQL imports.  If this is what's 
happening, the following change/addition in the mergeScheduler section 
of indexConfig in solrconfig.xml will fix it:


  mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxThreadCount1/int
int name=maxMergeCount6/int
  /mergeScheduler

If that doesn't fix it, then I would look for a problem with either your 
JDBC driver or your DB server.


Thanks,
Shawn


On 1/17/2013 7:19 AM, Otis Gospodnetic wrote:

Hi,

It looks like this is the cause:
JBC0016E: Remote call failed
(return code=-2,220). SDK9019E: internal errorSDK9019X:

Interestingly, Google gives just 1 hit for the above as query - your post.
But it seems you should look up what the above codes mean first...

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 2:43 AM, ashimbose ashimb...@gmail.com wrote:


I am trying to index large data (not rich document) about 5GB, but Its not
getting index. In case of small data it's perfectly indexing.For Large data
import XML response..


Re: group.ngroups behavior in response

2013-01-17 Thread Tomás Fernández Löbbe
Bu Amit is right, when you use group.main, the number of groups is not
displayed, even if you set grop.ngroups.

I think in this case NumFound should display the number of groups instead
of the number of docs matching. Other option would be to keep numFound as
the number of docs matching and add another attribute to the response that
shows the number of groups.


On Thu, Jan 17, 2013 at 11:51 AM, denl0 david.vandendriess...@gmail.comwrote:

 There's a parameter to enable that. :D

 In solrJ

 solrQuery.setParam(group.ngroups, true);

 http://wiki.apache.org/solr/FieldCollapsing



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/group-ngroups-behavior-in-response-tp4033924p4034187.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: group.ngroups behavior in response

2013-01-17 Thread Otis Gospodnetic
I'd think adding a new response attribute would be more flexible and
powerful, thinking about clients, UIs, etc.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 10:15 AM, Tomás Fernández Löbbe 
tomasflo...@gmail.com wrote:

 Bu Amit is right, when you use group.main, the number of groups is not
 displayed, even if you set grop.ngroups.

 I think in this case NumFound should display the number of groups instead
 of the number of docs matching. Other option would be to keep numFound as
 the number of docs matching and add another attribute to the response that
 shows the number of groups.


 On Thu, Jan 17, 2013 at 11:51 AM, denl0 david.vandendriess...@gmail.com
 wrote:

  There's a parameter to enable that. :D
 
  In solrJ
 
  solrQuery.setParam(group.ngroups, true);
 
  http://wiki.apache.org/solr/FieldCollapsing
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/group-ngroups-behavior-in-response-tp4033924p4034187.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Solr commit taking too long

2013-01-17 Thread Shawn Heisey

On 1/16/2013 11:22 PM, Cool Techi wrote:

We have an index of approximately 400GB in size, indexing 5000 documents was 
taking 20 seconds. But lately, the indexing is taking very long, committing the 
same amount of document is taking 5-20 mins.

On checking the logs I can see that their a frequent merges happening, which I 
am guessing is the reason for this, how can this be improved. My configurations 
are given below,

useCompoundFilefalse/useCompoundFile
mergeFactor30/mergeFactor
ramBufferSizeMB64/ramBufferSizeMB


What version of Solr?  Version 4 will finish merges in the background 
even after indexing and commits are complete, although you do have to 
have a high enough maxMergeCount so that indexing stays in the 
foreground.  I use a maxMergeCount of 6 which seems to work for all 
situations.


Another thing that makes commits take an extremely long time is high 
autowarmCount values on Solr caches, especially filterCache.


Thanks,
Shawn



Solr multicore aborts with socket timeout exceptions

2013-01-17 Thread eShard
I'm currently running Solr 4.0 final on tomcat v7.0.34 with ManifoldCF v1.2
dev running on Jetty.

I have solr multicore set up with 10 cores. (Is this too much?)
I so I also have at least 10 connectors set up in ManifoldCF (1 per core, 10
JVMs per connection)
From the look of it; Solr couldn't handle all the data that ManifoldCF was
sending it and the connection would abort socket timeout exceptions.
I tried increasing the maxThreads to 200 on tomcat and it didn't work.
In the ManifoldCF throttling section, I decreased the number of JVMs per
connection from 10 down to 1 and not only did the crawl speed up
significantly, the socket exceptions went away (for the most part)
Here's the ticket for this issue:
https://issues.apache.org/jira/browse/CONNECTORS-608

My question is this: how do I increase the number of connections on the solr
side so I can run multiple ManifoldCF jobs concurrently without aborting or
timeouts?

The ManifoldCF team did mention that there was a committer who had socket
timeout exceptions in a newer version of Solr and he fixed it by increasing
the timeout window. I'm looking for that patch if available.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-multicore-aborts-with-socket-timeout-exceptions-tp4034250.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOlr 3.5 and sharding

2013-01-17 Thread Jean-Sebastien Vachon
Hi Erick,

It looks like we are saying the exact same thing but with different terms ;) 
I looked at the Solr glossary and you might be right.. maybe I should talk 
about partitions instead of shards.

Since my last message, I`ve configured the replication between the master and 
slave and everything is working fine except for my original question about the 
number of documents not matching my expectations.

I`ll try to clarify a few things and come back to this question...

Machine A (which I called the master node) is where the indexation takes place.
It consist of four Solr instances that will (eventually ) contain  1/4 of the 
entire collection. It`s just that, at this moment, since I have no control on 
which partition a given document is sent, I made copies of the same index for 
all partitions. Each Solr instance  has a replication handler configured. I 
will eventually get to the point of changing the indexation code to distribute 
documents evenly on all partitions but the person who can give me access to 
this portion is not available right now so I can do nothing about it.

Machine B has the same four shards setup to be replicas of the corresponding 
shard on machine A.
Machine B also contains another Solr instance with the default handler 
configured to use the four local partitions. This instance receives client`s 
requests, collect the results from each partition and then select the best 
matches to form the final response. We intent to add new slaves being exact 
copies of Machine B and load balance clients requests on all slaves.

My original question was that if each partition has 1000 documents matching a 
certain keyword and that I know all partitions have the same content then I was 
expecting to receive 4*1000 documents for the same keyword. But that is not the 
case.
The replication is not an issue here since the same request on the master node 
will give me the same result.

Each shard when called individually will give 1000 documents. But when I call 
them using the shards=xxx parameters then I am getting a little less than 4000 
documents. I was just curious to know why this was happening... Is this a bug? 
Or something I am misunderstanding...

Thanks for your time and contribution to Solr!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: January-17-13 8:46 AM
To: solr-user@lucene.apache.org
Subject: Re: SOlr 3.5 and sharding

You're still confusing shards (or at least mixing up the terminology) with 
simple replication. Shards are when you split up the index into several sub 
indexes and configure the sub-indexes to know about each other. Say you have 
1M docs in 2 shards. 500K of them would go on one shard and 500K on the other. 
But logically you have a single index of 1M docs. So the two shards have to 
know about each other and when you send a request to one of them, it 
automatically queries the other (as well as itself), collects the response and 
combines them, returning the top N to the requester.

This is totally different from replication. In replication (master/slave), each 
node has all 1M documents. Each node can work totally in isolation. An incoming 
request is handled by the slave without contacting any other node.

If you're copying around indexes AND configuring them as though they were 
shards, each request will be distributed to all shards and the results 
collated, giving you the same doc repeatedly in your result set.

If you have no access to the indexing code, you really can't go to a sharded 
setup.

Polling is when the slaves periodically ask the master has anything changed? 
If so then the slave pulls down the changes. The polling interval is configured 
in solrconfig.xml _on the slave_. So let's say you index docs to the master. 
For some interval, until the slaves poll the master and get an updated index, 
the number of searchable docs on the master will be different than for the 
slaves. Additionally, you may have the issue of the polling intervals for the 
slaves being offset from one another, so for some brief interval the counts on 
the slaves may be different as well.

Best
Erick

On Tue, Jan 15, 2013 at 10:18 AM, Jean-Sebastien Vachon 
jean-sebastien.vac...@wantedanalytics.com wrote:
 Ok I see what Erick`s meant now.. Thanks.

 The original index I`m working on contains about 120k documents. Since I have 
 no access to the code that pushes documents into the index, I made four 
 copies of the same index.

 The master node contains no data at all, it simply use the data available in 
 its four shards. Knowing that I have 1000 documents matching the keyword 
 java on each shard I was expecting to receive 4000 documents out of my 
 sharded setup. There are only a few documents that are not accounted for (The 
 result count is about 3996 which is pretty close but not accurate).

 Right now, the index is static so there is no need for any replication so the 
 polling interval has no effect.
 Later this week, I 

Re: Solr 4 slower than Solr 3.x?

2013-01-17 Thread Otis Gospodnetic
Hello,

Here is another one from the other day:
http://search-lucene.com/m/tqmNjXO51B/SolrCloud+Performance+for+High+Query+Volume

Am I the only one seeing people reporting this? :)

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Mon, Jan 14, 2013 at 10:55 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 I've seen this mentioned on the ML a few times now with the most recent
 one being:


 http://search-lucene.com/m/mbT4g1fQPr91/?subj=Solr+4+0+upgrade+reduced+performance

 Are there any known, good Solr 3.x vs. Solr 4.x benchmarks?

 Thanks,
 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/






Function Query vs. Analyzing results

2013-01-17 Thread John
Hi,

Is there any performance boost when using FunctionQuery over getting all
the documents and analyzing their result fields?

As far as I understand, Function Query does exactly that, for each matched
document it feches the fields you're interested at, and then it calculates
whatever score mechanism you need.

Are there some special configurations that I can use that take make
FunctionQueries faster?

Cheers,
John


Re: Function Query vs. Analyzing results

2013-01-17 Thread Mikhail Khludnev
Hello John,

 getting all the documents and analyzing their result fields?

is almost not ever possible. Lucene stored fields usually are really slow.

when FunctionQueries is backed of field values it uses Lucene FieldCache,
which is array of field values that's damn faster.

You are welcome.


On Thu, Jan 17, 2013 at 8:20 PM, John fatmanc...@gmail.com wrote:

 Hi,

 Is there any performance boost when using FunctionQuery over getting all
 the documents and analyzing their result fields?

 As far as I understand, Function Query does exactly that, for each matched
 document it feches the fields you're interested at, and then it calculates
 whatever score mechanism you need.

 Are there some special configurations that I can use that take make
 FunctionQueries faster?

 Cheers,
 John




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Function Query vs. Analyzing results

2013-01-17 Thread John
Hi Mikhail,

Thanks for the info.

If my FunctionQuery accesses stored fields like that:

public float floatVal(int docNum) {

  Document doc = null;
  try { doc = reader.document(docNum); } catch (Exception e) {}
  return getSimilarityScore(doc);
}

Is it still the same case? Is there a faster way to access document info?


Cheers,

John




On Thu, Jan 17, 2013 at 6:40 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello John,

  getting all the documents and analyzing their result fields?

 is almost not ever possible. Lucene stored fields usually are really slow.

 when FunctionQueries is backed of field values it uses Lucene FieldCache,
 which is array of field values that's damn faster.

 You are welcome.


 On Thu, Jan 17, 2013 at 8:20 PM, John fatmanc...@gmail.com wrote:

  Hi,
 
  Is there any performance boost when using FunctionQuery over getting all
  the documents and analyzing their result fields?
 
  As far as I understand, Function Query does exactly that, for each
 matched
  document it feches the fields you're interested at, and then it
 calculates
  whatever score mechanism you need.
 
  Are there some special configurations that I can use that take make
  FunctionQueries faster?
 
  Cheers,
  John
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



MultiValue

2013-01-17 Thread anurag.jain
my json file look like

[ { last_name : jain, training_skill:[c, c++, php,java,.net] }]

can u please suggest me how should i declare field in schema for
trainingskill field



please reply 

urgent





--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305.html
Sent from the Solr - User mailing list archive at Nabble.com.


searching for q terms that start with a dash/hyphen being interpreted as prohibited clauses

2013-01-17 Thread geeky2
hello

environment: solr 3.5

problem statement:

i have a requirement to search for part numbers that start with a dash /
hyphen.

example q= term: *-0004A-0436*

example query:

http://some_url:some_port/some_core/select?facet=falsesort=score+desc%2C+rankNo+asc%2C+partCnt+descstart=0q=*-0004A-0436*+itemType%3A1wt=xmlqt=itemModelNoProductTypeBrandSearchrows=4

what is happening: query is returning a huge results set.  in reality there
is one (1) and only one record in the database with this part number.

i believe this is happening because the dash is being interpreted by the
query parser as a prohibited clause and the effective result is, give me
everything that does NOT have this part number.

how is this handled so that the search is conducted for the actual part:
-0004A-0436

thx
mark

more information:

request handler in solrconfig.xml

  requestHandler name=itemModelNoProductTypeBrandSearch
class=solr.SearchHandler default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows10/int
  str name=qfitemModelNoExactMatchStr^30 itemModelNo^.9
divProductTypeDesc^.8 plsBrandDesc^.5/str
  str name=q.alt*:*/str
  str name=sortscore desc, rankNo desc, partCnt desc/str
  str name=facettrue/str
  str name=facet.fielditemModelDescFacet/str
  str name=facet.fieldplsBrandDescFacet/str
  str name=facet.fielddivProductTypeIdFacet/str
/lst
lst name=appends
/lst
lst name=invariants
/lst
  /requestHandler


field information from schema.xml (if helpful)

field name=itemModelNoExactMatchStr type=text_general_trim
indexed=true stored=true/
 
field name=itemModelNo type=text_en_splitting indexed=true
stored=true omitNorms=true/

field name=divProductTypeDesc type=text_general_edge_ngram
indexed=true stored=true multiValued=true/

field name=plsBrandDesc type=text_general_edge_ngram indexed=true
stored=true multiValued=true/


fieldType name=text_general_trim class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/


filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.PatternReplaceFilterFactory pattern=\.
replacement= replace=all/
filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer

fieldType name=text_general_edge_ngram class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory
synonyms=synonyms_SHC.txt ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType






--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-for-q-terms-that-start-with-a-dash-hyphen-being-interpreted-as-prohibited-clauses-tp4034310.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread oakstream
Hello,
I have point data (lat/lon) stored in hbase/hadoop and would like to query
the data spatially with polygons.  (If I pass in a few polygons find me all
the records that exist within these polygons.  I need it to support polygons
not just box queries).  Hadoop doesn't really have much support that I could
find for these types of queries.  I was wondering if I could leverage SOLR
spatial 4 and create spatial indexes on the hbase data that could be used to
query this data?? I need near real-time answers (within a couple seconds). 

If anyone has any thoughts on this I would greatly appreciate them.

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MultiValue

2013-01-17 Thread Dikchant Sahi
you just need to make the field as multivalued.

field name=last_name type=string indexed=true stored=true * */
field name=trainingskill type=string indexed=true stored=true
*multiValued=true
*/

type should be set based on your search requirements.

On Thu, Jan 17, 2013 at 11:27 PM, anurag.jain anurag.k...@gmail.com wrote:

 my json file look like

 [ { last_name : jain, training_skill:[c, c++, php,java,.net] }]

 can u please suggest me how should i declare field in schema for
 trainingskill field



 please reply

 urgent





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/MultiValue-tp4034305.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: MultiValue

2013-01-17 Thread anurag.jain
  [ { last_name : jain, training_skill:[c, c++, php,java,.net] }
]

actually i want to tokenize in   c c++ php java .net


so through this i can make them as facet.


but problem is in list
training_skill:[c, c++, *php,java,.net*]






--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MultiValue

2013-01-17 Thread Dikchant Sahi
You mean to say that the problem is with json which is being ingested.

What you are trying to achieve is that you want to split the values on the
basis of comma and index it as multiple value.

What problem you are facing in indexing json in format Solr expects. If you
don't have control over it, probably you can try playing with custom
processors.




On Fri, Jan 18, 2013 at 12:31 AM, anurag.jain anurag.k...@gmail.com wrote:

   [ { last_name : jain, training_skill:[c, c++, php,java,.net]
 }
 ]

 actually i want to tokenize in   c c++ php java .net


 so through this i can make them as facet.


 but problem is in list
 training_skill:[c, c++, *php,java,.net*]






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034316.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Function Query vs. Analyzing results

2013-01-17 Thread Mikhail Khludnev
no-no-no. your implementation as slow as result processing, due to using
stored fields.
Fast way is something like
*org.apache.solr.schema.IntField.getValueSource(SchemaField,
QParser)* .
it's worth to check how the standard functions are build - check the static
{} block in org.apache.solr.search.ValueSourceParser
I just googled this tutorial and find it rather useful for you. Feel free
to check.
http://www.solrtutorial.com/custom-solr-functionquery.html


On Thu, Jan 17, 2013 at 8:53 PM, John fatmanc...@gmail.com wrote:

 Hi Mikhail,

 Thanks for the info.

 If my FunctionQuery accesses stored fields like that:

 public float floatVal(int docNum) {

   Document doc = null;
   try { doc = reader.document(docNum); } catch (Exception e) {}
   return getSimilarityScore(doc);
 }

 Is it still the same case? Is there a faster way to access document info?


 Cheers,

 John




 On Thu, Jan 17, 2013 at 6:40 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

  Hello John,
 
   getting all the documents and analyzing their result fields?
 
  is almost not ever possible. Lucene stored fields usually are really
 slow.
 
  when FunctionQueries is backed of field values it uses Lucene FieldCache,
  which is array of field values that's damn faster.
 
  You are welcome.
 
 
  On Thu, Jan 17, 2013 at 8:20 PM, John fatmanc...@gmail.com wrote:
 
   Hi,
  
   Is there any performance boost when using FunctionQuery over getting
 all
   the documents and analyzing their result fields?
  
   As far as I understand, Function Query does exactly that, for each
  matched
   document it feches the fields you're interested at, and then it
  calculates
   whatever score mechanism you need.
  
   Are there some special configurations that I can use that take make
   FunctionQueries faster?
  
   Cheers,
   John
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
   mkhlud...@griddynamics.com
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: MultiValue

2013-01-17 Thread anurag.jain
actually  [ { last_name : jain, training_skill:*[c, c++,
php,java,.net]* }  ]   training_skill is list. and if i want to store in
string field type then it will include [ and , also. so how to avoid ? or it
will not. 


or do you have any other field type definition through which my work will be
easy. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Mikhail Khludnev
Snake,

It was killed in 4.0/trunk more than two years ago
https://issues.apache.org/jira/browse/SOLR-1846
Setting abortOnConfigurationError==false has not worked for some time, and
based on a POLL of existing users, no one seems to need/want it,
You might be in that rare case when it used to don't work before.


On Thu, Jan 17, 2013 at 6:21 PM, snake r...@michaels.me.uk wrote:

 here is what it says in the SOLR info page

 Solr Specification Version: 1.4.0.2009.11.18.10.19.05
  Solr Implementation Version: 1.4.1-dev exported - kvinu - 2009-11-18
 10:19:05
  Lucene Specification Version: 2.9.1
  Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25



 On Thu, Jan 17, 2013 at 1:33 PM, Alexandre Rafalovitch [via Lucene] 
 ml-node+s472066n4034156...@n3.nabble.com wrote:

  Which version of Solr is it for?
 
  I had a situation on Solr4, where I basically did not have a directory
  that
  solr.xml was pointing at for one of the cores. And Solr continued working
  but the Admin interface was showing big red banners about configuration
  problem.
 
  So, maybe it was a bug that was fixed for Solr 4?
 
  Regards,
 Alex.
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Thu, Jan 17, 2013 at 8:03 AM, snake [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4034156i=0
  wrote:
 
   I will explain the scenario just to avoid all the potential replies
  asking
   why.
  
   We run coldFusion servers (windows) which has SOLR built in (running on
   Jetty).
   A customer creates a collection which is stored within their own
  webspace,
   they only have read/write access to their own webspace so cannot put
  them
   anywhere else.
  
   the default value for abortOnConfigurationError is true.
   This causes endless problems when customers make changes to their
  websites
   or cancel their hosting, the collection gets deleted, and SOLR then
  crashes
   because it cannot find the config files for that collection.
   We then have to find out which collection is causing the problem, and
   manually remove its entry from solr.xml
  
   Obviously this is a PITA.
  
   In the error output it says.
  
   If you want solr to continue after configuration errors, change:
   abortOnConfigurationErrorfalse/abortOnConfigurationError
   in solr.xml
  
   I have tried this, but it has no effect.
   I have also tried putting it in all the solrconfig.xml files
   I tried this
  
  
 
 abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError
 
   and this
   abortOnConfigurationErrorfalse/abortOnConfigurationError
  
   neither had any effect.
  
   How do you get this to work ?
  
  
  
  
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
 
  --
   If you reply to this email, your message will be added to the discussion
  below:
 
 
 http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034156.html
   To unsubscribe from how to get abortOnConfigurationError=false working,
 click
  here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034149code=cnVzc0BtaWNoYWVscy5tZS51a3w0MDM0MTQ5fDEwMDg4NTg5MzM=
 
  .
  NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 



 --

 --

 Russ Michaels

 www.bluethunderinternet.com  : Business hosting services  solutions
 www.cfmldeveloper.com: ColdFusion developer community
 www.michaels.me.uk   : my blog
 www.cfsearch.com : ColdFusion search engine
 **
 *skype me* : russmichaels




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034178.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread Mikhail Khludnev
David,

What's the documents and the field? It can help to suggest workaround.


On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com wrote:

 I want to configure Field Collapsing, but my target field is multi-valued
 (e.g. the field I want to group on has a variable # of entries per
 document,
 1-N entries).

 I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
 grouping doesn't support multi-valued fields yet.

 Anything in the works on that front by chance?  Any common work-arounds?





-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: MultiValue

2013-01-17 Thread Gora Mohanty
On 18 January 2013 00:31, anurag.jain anurag.k...@gmail.com wrote:

   [ { last_name : jain, training_skill:[c, c++, php,java,.net]
 }
 ]

 actually i want to tokenize in   c c++ php java .net

What do you mean by tokenize in this case? It has
been a while since I had occasion to use JSON input,
and also do not remember which Solr version introduced
this, but with a JSON array mapped to a multi-valued
Solr field, you should get one value per entry in the array.
http://wiki.apache.org/solr/UpdateJSON#Update_Commands
seems to be in agreement.

 so through this i can make them as facet.


 but problem is in list
 training_skill:[c, c++, *php,java,.net*]

Faceting should be straightforward. Are you not
seeing the behaviour described above? Could
you describe the issues that you are facing in
more detail?

Regards,
Gora


Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Chris Hostetter

: You're not only giving up the ability to monitor things, you're also giving up
: the ability to detect errors.  All exceptions that get thrown by the internals
: of ConcurrentUpdateSolrServer are swallowed, your code will never know they
: happened.  The client log (slf4j with whatever binding  config you chose) may
: have such errors logged, but they are completely undetectable by the code.

This isn't the first time i've seen someone make this claim, but i really 
don't understand it -- ConcurrentUpdateSolrServer has a handleError() 
method that gets called when an error happens during the async processing.  
By default it just logs the exception, if you want to do something more 
interesting with it in your code, just subclass ConcurrentUpdateSolrServer 
and override that method -- that's the entire point of that method.

The bigger issue is wether your client cod could reasonable do anything 
if/when that method is called -- because it's all async, you probably 
can't do much more then log/report it in your own custom way instead of 
just using org.slf4j.Logger.


-Hoss


Re: MultiValue

2013-01-17 Thread Alexandre Rafalovitch
I think the problem here is that the list has 3-values, but the last one is
actually a set of several as well. Anurag seem to be able to split them
into separate values whether they came as individual array items or as part
of joint list. So, we have a mix of multiValue submission and desire to
split it out.

The correct solution I suspect would be to normalize everything to just be
training_skill:[c, c++, php, java, .net] before this hits Solr.

However, since he wants this for facets and as a training exercise, one
could remember that facets values come from the tokens, not stored value.
So, it might be possible to do this:
field name=test type=comaSplit indexed=true stored=true
multiValued=true/
fieldType name=comaSplit class=solr.TextField
 positionIncrementGap=100 
analyzer
   tokenizer class=solr.PatternTokenizerFactory pattern=, /
/analyzer
/fieldType

I think the filter code will probably just aggregate all tokens despite the
fact that they are spread over multiple values.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Jan 17, 2013 at 2:33 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 18 January 2013 00:31, anurag.jain anurag.k...@gmail.com wrote:
 
[ { last_name : jain, training_skill:[c, c++,
 php,java,.net]
  }
  ]
 
  actually i want to tokenize in   c c++ php java .net

 What do you mean by tokenize in this case? It has
 been a while since I had occasion to use JSON input,
 and also do not remember which Solr version introduced
 this, but with a JSON array mapped to a multi-valued
 Solr field, you should get one value per entry in the array.
 http://wiki.apache.org/solr/UpdateJSON#Update_Commands
 seems to be in agreement.

  so through this i can make them as facet.
 
 
  but problem is in list
  training_skill:[c, c++, *php,java,.net*]

 Faceting should be straightforward. Are you not
 seeing the behaviour described above? Could
 you describe the issues that you are facing in
 more detail?

 Regards,
 Gora



Re: MultiValue

2013-01-17 Thread anurag.jain
@Alexandre Rafalovitch Thanks. 

yeah you got my point.


training_skill:[c, c++, php, java, .net]
but it is not possible for me to split php,java,.net  because data can
very and data is very large. i mean i have to perform on 5 line  data. 

it might come[c++,php,java,.net,c#,ruby, python  java] like that. 

so i have to perform on this list. just want to ignore [  , ] 











--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034339.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MultiValue

2013-01-17 Thread Alexandre Rafalovitch
Try my suggested field definition and see if it helps with faceting. It
should. Try it on a small example or a fake schema.

But I would still recommend escalating the problem up the chain to an
architect or similar. Because I bet that data is stored in multiple places
(e.g. in the database) and you will hit a real problem later when you will
try to match a particular data/configuration set back to original sources.

Otherwise, like suggested somewhere else in the chain, you can also look at
update.chain and Request Processors. But you will have to write one
yourself for this situation.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Jan 17, 2013 at 2:50 PM, anurag.jain anurag.k...@gmail.com wrote:

 @Alexandre Rafalovitch Thanks.

 yeah you got my point.


 training_skill:[c, c++, php, java, .net]
 but it is not possible for me to split php,java,.net  because data can
 very and data is very large. i mean i have to perform on 5 line  data.

 it might come[c++,php,java,.net,c#,ruby, python  java] like that.

 so i have to perform on this list. just want to ignore [  , ]



Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Uwe Reh

Hi Shawn,

don't panic
Due 'historical' reasons, like comparing the different subclasses of 
SolrServer, I have an HttpSolrServer for querys and commits. I've never 
tried to to use the CUSS for anything else than adding documents.


As I wrote, it was a home made problem and not a bug. Sometimes I hope, 
not to be the only dumbass and others may caught in the same trap.


Uwe


Am 17.01.2013 15:52, schrieb Shawn Heisey:

If you are using the same ConcurrentUpdateSolrServer object for all
update interaction with Solr (including commits) and you still have to
do the blockUntilFinished() in your own code before you issue an
explicit commit, that sounds like a bug, and you should put all the
details in a Jira issue.




Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Shawn Heisey

On 1/17/2013 12:38 PM, Chris Hostetter wrote:


: You're not only giving up the ability to monitor things, you're also giving up
: the ability to detect errors.  All exceptions that get thrown by the internals
: of ConcurrentUpdateSolrServer are swallowed, your code will never know they
: happened.  The client log (slf4j with whatever binding  config you chose) may
: have such errors logged, but they are completely undetectable by the code.

This isn't the first time i've seen someone make this claim, but i really
don't understand it -- ConcurrentUpdateSolrServer has a handleError()
method that gets called when an error happens during the async processing.
By default it just logs the exception, if you want to do something more
interesting with it in your code, just subclass ConcurrentUpdateSolrServer
and override that method -- that's the entire point of that method.

The bigger issue is wether your client cod could reasonable do anything
if/when that method is called -- because it's all async, you probably
can't do much more then log/report it in your own custom way instead of
just using org.slf4j.Logger.


I have my update process (using HttpSolrServer) encapsulated in a method 
that has several parts -- deletes, reinserts, a specific kind of partial 
reindex, and inserting new content.  It ends with a commit().  Any 
exceptions that happen down inside this method are either rethrown or 
propagate.  When the method is called, update position information is 
only updated if it returns without throwing an exception.


For my use case, it is enough to know that an error happened, exactly 
where it happened is not critical unless the problem turns out to be in 
the data - a scenario that has not happened so far.  All failures so far 
have been due to the server or Solr being down.


I understand that many people would want to know which update failed.  I 
hope to come up with a way to make this possible with CUSS out of the box.


Do you have an example of how to override handleError that would make 
error detection easy?  IMHO, either that information should be easily 
accessible to someone who's looking at the javadoc for CUSS, or the 
class should provide an out of the box way to detect errors.


I will work on this problem, not just complain about the current state.

Thanks,
Shawn



Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread snake
Ok so is there any other to stop this problem I am having where any site
can break solr by delering their collection?
Seems odd everyone would vote to remove a feature that would make solr more
stable.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034349.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Walter Underwood
Or a different design.

You can mark collections for deletion, then delete them in an organized, safe 
manner later.

wunder

On Jan 17, 2013, at 12:40 PM, snake wrote:

 Ok so is there any other to stop this problem I am having where any site
 can break solr by delering their collection?
 Seems odd everyone would vote to remove a feature that would make solr more
 stable.
 





Why do I keep seeing org.apache.solr.core.SolrCore execute in the tomcat logs

2013-01-17 Thread eShard
I keep seeing these in the tomcat logs:
Jan 17, 2013 3:57:33 PM org.apache.solr.core.SolrCore execute
INFO: [Lisa] webapp=/solr path=/admin/logging
params={since=1358453312320wt=jso
n} status=0 QTime=0

I'm just curious:
What is getting executed here? I'm not running any queries against this core
or using it in any way currently.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-do-I-keep-seeing-org-apache-solr-core-SolrCore-execute-in-the-tomcat-logs-tp4034353.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread snake
I think your not understanding the issue.Imagine www.acme.com has created a
collection.
This resides in d:\acme.com\wwwroot\collections

Then they decide to redo their website, or they get a new developer who
decides not to use collections, or they simply move hosts, so they delete
the old one.
The collection is now gone.
Solr now cannot find the config files for that collection since they are
gone, so solr crashes and breaks every other website on the entire server
that is using solr.
The customers have no idea this will happen, no knowledge about having to
get collections removed properly etc, so saying they should do this and
that simply wont happen so is not a solution.

I need a way to avoid the above scenarios, is it possible?
On Jan 17, 2013 8:43 PM, Walter Underwood [via Lucene] 
ml-node+s472066n4034351...@n3.nabble.com wrote:

 Or a different design.

 You can mark collections for deletion, then delete them in an organized,
 safe manner later.

 wunder

 On Jan 17, 2013, at 12:40 PM, snake wrote:

  Ok so is there any other to stop this problem I am having where any site
  can break solr by delering their collection?
  Seems odd everyone would vote to remove a feature that would make solr
 more
  stable.
 





 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034351.html
  To unsubscribe from how to get abortOnConfigurationError=false working, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034149code=cnVzc0BtaWNoYWVscy5tZS51a3w0MDM0MTQ5fDEwMDg4NTg5MzM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034354.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Yonik Seeley
On Thu, Jan 17, 2013 at 3:40 PM, snake r...@michaels.me.uk wrote:
 Ok so is there any other to stop this problem I am having where any site
 can break solr by delering their collection?
 Seems odd everyone would vote to remove a feature that would make solr more
 stable.

I agree.

abortOnConfigurationError was more about a single core... if the core
would still be loaded if there were config errors.

There *should* be a way to still load other cores if one core has an
error and is not loaded.  If there's not currently, then we should
implement it.

-Yonik
http://lucidworks.com


Questions about boosting

2013-01-17 Thread Shawn Heisey
I've been trying to figure this out on my own, but I've come up empty so 
far.  I need to boost documents from a certain provider.  The idea is 
that if any documents in a result match a separate query (like 
provider:bigbucks), I need to multiply the score by X.  It's important 
that the result set of the actual query is not changed, just the order.


I've tried a few things from the relevancy page on the wiki but so far I 
can't seem to get anything to work.  What syntax should I be using?  Is 
it possible to do this at query time?


Thanks,
Shawn


Re: Why do I keep seeing org.apache.solr.core.SolrCore execute in the tomcat logs

2013-01-17 Thread Alexandre Rafalovitch
You must have an Admin UI open and pointing at Logging section. So, it
sends a ping to see if any new log entries were added.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Jan 17, 2013 at 4:00 PM, eShard zim...@yahoo.com wrote:

 I keep seeing these in the tomcat logs:
 Jan 17, 2013 3:57:33 PM org.apache.solr.core.SolrCore execute
 INFO: [Lisa] webapp=/solr path=/admin/logging
 params={since=1358453312320wt=jso
 n} status=0 QTime=0

 I'm just curious:
 What is getting executed here? I'm not running any queries against this
 core
 or using it in any way currently.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Why-do-I-keep-seeing-org-apache-solr-core-SolrCore-execute-in-the-tomcat-logs-tp4034353.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread snake
my knowledge of solr is pretty limited, I have only been investigating this
in the last couple of days due to this issue.
The way SOLR is implemented in ColdFusion is with a single core, so all
sites run under same core. I presume a core is like multiple instances ?


On Thu, Jan 17, 2013 at 9:03 PM, Yonik Seeley-4 [via Lucene] 
ml-node+s472066n403435...@n3.nabble.com wrote:

 On Thu, Jan 17, 2013 at 3:40 PM, snake [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4034355i=0
 wrote:
  Ok so is there any other to stop this problem I am having where any site
  can break solr by delering their collection?
  Seems odd everyone would vote to remove a feature that would make solr
 more
  stable.

 I agree.

 abortOnConfigurationError was more about a single core... if the core
 would still be loaded if there were config errors.

 There *should* be a way to still load other cores if one core has an
 error and is not loaded.  If there's not currently, then we should
 implement it.

 -Yonik
 http://lucidworks.com


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034355.html
  To unsubscribe from how to get abortOnConfigurationError=false working, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034149code=cnVzc0BtaWNoYWVscy5tZS51a3w0MDM0MTQ5fDEwMDg4NTg5MzM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 

--

Russ Michaels

www.bluethunderinternet.com  : Business hosting services  solutions
www.cfmldeveloper.com: ColdFusion developer community
www.michaels.me.uk   : my blog
www.cfsearch.com : ColdFusion search engine
**
*skype me* : russmichaels




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034358.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Alexandre Rafalovitch
Solr 4 most definitely ignores missing cores (just run into that
accidentally again myself). So, if you start Solr and directory is missing,
it will survive (but complain).

The other problem is what happens when a customer deletes the account and
the core directory disappears in a middle of open searcher. I would suggest
some-sort of pre-delete trigger that hits Solr admin interface and unloads
that core first.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Jan 17, 2013 at 4:03 PM, Yonik Seeley yo...@lucidworks.com wrote:

 On Thu, Jan 17, 2013 at 3:40 PM, snake r...@michaels.me.uk wrote:
  Ok so is there any other to stop this problem I am having where any site
  can break solr by delering their collection?
  Seems odd everyone would vote to remove a feature that would make solr
 more
  stable.

 I agree.

 abortOnConfigurationError was more about a single core... if the core
 would still be loaded if there were config errors.

 There *should* be a way to still load other cores if one core has an
 error and is not loaded.  If there's not currently, then we should
 implement it.

 -Yonik
 http://lucidworks.com



Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Shawn Heisey

On 1/17/2013 2:01 PM, snake wrote:

I think your not understanding the issue.Imagine www.acme.com has created a
collection.
This resides in d:\acme.com\wwwroot\collections

Then they decide to redo their website, or they get a new developer who
decides not to use collections, or they simply move hosts, so they delete
the old one.
The collection is now gone.
Solr now cannot find the config files for that collection since they are
gone, so solr crashes and breaks every other website on the entire server
that is using solr.
The customers have no idea this will happen, no knowledge about having to
get collections removed properly etc, so saying they should do this and
that simply wont happen so is not a solution.


Solr has no security measures.  If you are giving customers direct 
access to one or more directories on your Solr server, there are a LOT 
of ways that they can cause you problems, intentionally or not.


By adding a jar to their data directory and referencing it in their 
config, they can do just about anything.  Custom Solr components could 
be written that do one or more of the following:


- Tie up all of Solr's memory and cause it to crash.
- Grant general access to the server as the user that runs solr.
- Utilize a security vulnerability and gain admin access.

Changes need to be checked before implementation.  If a customer wants 
to use custom components, that would require extra scrutiny.  I can't 
think of any way to fully protect your server without requiring human 
intervention for all changes.


Thanks,
Shawn



Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread Otis Gospodnetic
Hi,

You certainly can do that, but you'll need to suck all data out of HBase
and index it in Solr first.  And then presumably you'll want to keep the 2
more or less in sync via incremental indexing.  Maybe Lily project can
help?  If not, you'll have to write something that scans HBase and indexes,
say via SolrJ.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 1:26 PM, oakstream
mike.oa...@oakstreamsystems.comwrote:

 Hello,
 I have point data (lat/lon) stored in hbase/hadoop and would like to query
 the data spatially with polygons.  (If I pass in a few polygons find me all
 the records that exist within these polygons.  I need it to support
 polygons
 not just box queries).  Hadoop doesn't really have much support that I
 could
 find for these types of queries.  I was wondering if I could leverage SOLR
 spatial 4 and create spatial indexes on the hbase data that could be used
 to
 query this data?? I need near real-time answers (within a couple seconds).

 If anyone has any thoughts on this I would greatly appreciate them.

 Thank you



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOlr 3.5 and sharding

2013-01-17 Thread Erick Erickson
Hmmm, Maybe I'm finally getting it.

Right, that does seem odd. I would expect you to get 4x the number of
docs on any particular shard/replica in this situation.

What happens you look at the Solr logs for each partition? You should
be able to glean the num results from the logs. I guess there are a
couple of possibilities
1 each machine actually returns N documents, but the aggregator does
something weird and gives you  4X. Indicating something's peculiar
with the Solr aggregation.
2 you find that, for some reason, you aren't getting the same count
_at the server level_, indicating your assertion that all the indexes
are identical isn't valid.

All of which means I'm pretty much out of ideas, it's hunt-and-seek time.

Erick

On Thu, Jan 17, 2013 at 10:53 AM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedanalytics.com wrote:
 Hi Erick,

 It looks like we are saying the exact same thing but with different terms ;)
 I looked at the Solr glossary and you might be right.. maybe I should talk 
 about partitions instead of shards.

 Since my last message, I`ve configured the replication between the master and 
 slave and everything is working fine except for my original question about 
 the number of documents not matching my expectations.

 I`ll try to clarify a few things and come back to this question...

 Machine A (which I called the master node) is where the indexation takes 
 place.
 It consist of four Solr instances that will (eventually ) contain  1/4 of the 
 entire collection. It`s just that, at this moment, since I have no control on 
 which partition a given document is sent, I made copies of the same index for 
 all partitions. Each Solr instance  has a replication handler configured. I 
 will eventually get to the point of changing the indexation code to 
 distribute documents evenly on all partitions but the person who can give me 
 access to this portion is not available right now so I can do nothing about 
 it.

 Machine B has the same four shards setup to be replicas of the corresponding 
 shard on machine A.
 Machine B also contains another Solr instance with the default handler 
 configured to use the four local partitions. This instance receives client`s 
 requests, collect the results from each partition and then select the best 
 matches to form the final response. We intent to add new slaves being exact 
 copies of Machine B and load balance clients requests on all slaves.

 My original question was that if each partition has 1000 documents matching a 
 certain keyword and that I know all partitions have the same content then I 
 was expecting to receive 4*1000 documents for the same keyword. But that is 
 not the case.
 The replication is not an issue here since the same request on the master 
 node will give me the same result.

 Each shard when called individually will give 1000 documents. But when I call 
 them using the shards=xxx parameters then I am getting a little less than 
 4000 documents. I was just curious to know why this was happening... Is this 
 a bug? Or something I am misunderstanding...

 Thanks for your time and contribution to Solr!

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: January-17-13 8:46 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOlr 3.5 and sharding

 You're still confusing shards (or at least mixing up the terminology) with 
 simple replication. Shards are when you split up the index into several sub 
 indexes and configure the sub-indexes to know about each other. Say you 
 have 1M docs in 2 shards. 500K of them would go on one shard and 500K on the 
 other. But logically you have a single index of 1M docs. So the two shards 
 have to know about each other and when you send a request to one of them, it 
 automatically queries the other (as well as itself), collects the response 
 and combines them, returning the top N to the requester.

 This is totally different from replication. In replication (master/slave), 
 each node has all 1M documents. Each node can work totally in isolation. An 
 incoming request is handled by the slave without contacting any other node.

 If you're copying around indexes AND configuring them as though they were 
 shards, each request will be distributed to all shards and the results 
 collated, giving you the same doc repeatedly in your result set.

 If you have no access to the indexing code, you really can't go to a sharded 
 setup.

 Polling is when the slaves periodically ask the master has anything 
 changed? If so then the slave pulls down the changes. The polling interval 
 is configured in solrconfig.xml _on the slave_. So let's say you index docs 
 to the master. For some interval, until the slaves poll the master and get an 
 updated index, the number of searchable docs on the master will be different 
 than for the slaves. Additionally, you may have the issue of the polling 
 intervals for the slaves being offset from one another, so for some brief 
 

Re: Solr cache considerations

2013-01-17 Thread Erick Erickson
filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
cache). Notice the /8. This reflects the fact that the filters are
represented by a bitset on the _internal_ Lucene ID. UniqueId has no
bearing here whatsoever. This is, in a nutshell, why warming is
required, the internal Lucene IDs may change. Note also that it's
maxDoc, the internal arrays have holes for deleted documents.

Note this is an _upper_ bound, if there are only a few docs that
match, the size will be (num of matching docs) * sizeof(int)).

fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
It depends on whether these are per-segment caches or not. Any per
segment cache is still valid.

Think of documentCache as intended to hold the stored fields while
various components operate on it, thus avoiding repeatedly fetching
the data from disk. It's _usually_ not too big a worry.

About hard-commits once a day. That's _extremely_ long. Think instead
of committing more frequently with openSearcher=false. If nothing
else, you transaction log will grow lots and lots and lots. I'm
thinking on the order of 15 minutes, or possibly even much less. With
softCommits happening more often, maybe every 15 seconds. In fact, I'd
start out with soft commits every 15 seconds and hard commits
(openSearcher=false) every 5 minutes. The problem with hard commits
being once a day is that, if for any reason the server is interrupted,
on startup Solr will try to replay the entire transaction log to
assure index integrity. Not to mention that your tlog will be huge.
Not to mention that there is some memory usage for each document in
the tlog. Hard commits roll over the tlog, flush the in-memory tlog
pointers, close index segments, etc.

Best
Erick

On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh isaac.he...@gmail.com wrote:
 Hi,

 I am going to build a big Solr (4.0?) index, which holds some dozens of
 millions of documents. Each document has some dozens of fields, and one big
 textual field.
 The queries on the index are non-trivial, and a little-bit long (might be
 hundreds of terms). No query is identical to another.

 Now, I want to analyze the cache performance (before setting up the whole
 environment), in order to estimate how much RAM will I need.

 filterCache:
 In my scenariom, every query has some filters. let's say that each filter
 matches 1M documents, out of 10M. Does the estimated memory usage should be
 1M * sizeof(uniqueId) * num-of-filters-in-cache?

 fieldValueCache:
 Due to the difference between queries, I guess that fieldValueCache is the
 most important factor on query performance. Here comes a generic question:
 I'm indexing new documents to the index constantly. Soft commits will be
 performed every 10 mins. Does it say that the cache is meaningless, after
 every 10 minutes?

 documentCache:
 enableLazyFieldLoading will be enabled, and fl contains a very small set
 of fields. BUT, I need to return highlighting on about (possibly) 20
 fields. Does the highlighting component use the documentCache? I guess that
 highlighting requires the whole field to be loaded into the documentCache.
 Will it happen only for fields that matched a term from the query?

 And one more question: I'm planning to hard-commit once a day. Should I
 prepare to a significant RAM usage growth between hard-commits? (consider a
 lot of new documents in this period...)
 Does this RAM comes from the same pool as the caches? An OutOfMemory
 exception can happen is this scenario?

 Thanks a lot.


Re: searching for q terms that start with a dash/hyphen being interpreted as prohibited clauses

2013-01-17 Thread Erick Erickson
I think all you need to do is escape the hyphen, or have you tried that already?

Best
Erick

On Thu, Jan 17, 2013 at 1:38 PM, geeky2 gee...@hotmail.com wrote:
 hello

 environment: solr 3.5

 problem statement:

 i have a requirement to search for part numbers that start with a dash /
 hyphen.

 example q= term: *-0004A-0436*

 example query:

 http://some_url:some_port/some_core/select?facet=falsesort=score+desc%2C+rankNo+asc%2C+partCnt+descstart=0q=*-0004A-0436*+itemType%3A1wt=xmlqt=itemModelNoProductTypeBrandSearchrows=4

 what is happening: query is returning a huge results set.  in reality there
 is one (1) and only one record in the database with this part number.

 i believe this is happening because the dash is being interpreted by the
 query parser as a prohibited clause and the effective result is, give me
 everything that does NOT have this part number.

 how is this handled so that the search is conducted for the actual part:
 -0004A-0436

 thx
 mark

 more information:

 request handler in solrconfig.xml

   requestHandler name=itemModelNoProductTypeBrandSearch
 class=solr.SearchHandler default=false
 lst name=defaults
   str name=defTypeedismax/str
   str name=echoParamsall/str
   int name=rows10/int
   str name=qfitemModelNoExactMatchStr^30 itemModelNo^.9
 divProductTypeDesc^.8 plsBrandDesc^.5/str
   str name=q.alt*:*/str
   str name=sortscore desc, rankNo desc, partCnt desc/str
   str name=facettrue/str
   str name=facet.fielditemModelDescFacet/str
   str name=facet.fieldplsBrandDescFacet/str
   str name=facet.fielddivProductTypeIdFacet/str
 /lst
 lst name=appends
 /lst
 lst name=invariants
 /lst
   /requestHandler


 field information from schema.xml (if helpful)

 field name=itemModelNoExactMatchStr type=text_general_trim
 indexed=true stored=true/

 field name=itemModelNo type=text_en_splitting indexed=true
 stored=true omitNorms=true/

 field name=divProductTypeDesc type=text_general_edge_ngram
 indexed=true stored=true multiValued=true/

 field name=plsBrandDesc type=text_general_edge_ngram indexed=true
 stored=true multiValued=true/


 fieldType name=text_general_trim class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.TrimFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

 fieldType name=text_en_splitting class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/


 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.PatternReplaceFilterFactory pattern=\.
 replacement= replace=all/
 filter class=solr.EdgeNGramFilterFactory minGramSize=3
 maxGramSize=15 side=front/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer

 fieldType name=text_general_edge_ngram class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms_SHC.txt ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=3
 maxGramSize=15 side=front/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType






 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/searching-for-q-terms-that-start-with-a-dash-hyphen-being-interpreted-as-prohibited-clauses-tp4034310.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr cache considerations

2013-01-17 Thread Tomás Fernández Löbbe
I think fieldValueCache is not per segment, only fieldCache is. However,
unless I'm missing something, this cache is only used for faceting on
multivalued fields


On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson erickerick...@gmail.comwrote:

 filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
 cache). Notice the /8. This reflects the fact that the filters are
 represented by a bitset on the _internal_ Lucene ID. UniqueId has no
 bearing here whatsoever. This is, in a nutshell, why warming is
 required, the internal Lucene IDs may change. Note also that it's
 maxDoc, the internal arrays have holes for deleted documents.

 Note this is an _upper_ bound, if there are only a few docs that
 match, the size will be (num of matching docs) * sizeof(int)).

 fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
 It depends on whether these are per-segment caches or not. Any per
 segment cache is still valid.

 Think of documentCache as intended to hold the stored fields while
 various components operate on it, thus avoiding repeatedly fetching
 the data from disk. It's _usually_ not too big a worry.

 About hard-commits once a day. That's _extremely_ long. Think instead
 of committing more frequently with openSearcher=false. If nothing
 else, you transaction log will grow lots and lots and lots. I'm
 thinking on the order of 15 minutes, or possibly even much less. With
 softCommits happening more often, maybe every 15 seconds. In fact, I'd
 start out with soft commits every 15 seconds and hard commits
 (openSearcher=false) every 5 minutes. The problem with hard commits
 being once a day is that, if for any reason the server is interrupted,
 on startup Solr will try to replay the entire transaction log to
 assure index integrity. Not to mention that your tlog will be huge.
 Not to mention that there is some memory usage for each document in
 the tlog. Hard commits roll over the tlog, flush the in-memory tlog
 pointers, close index segments, etc.

 Best
 Erick

 On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh isaac.he...@gmail.com
 wrote:
  Hi,
 
  I am going to build a big Solr (4.0?) index, which holds some dozens of
  millions of documents. Each document has some dozens of fields, and one
 big
  textual field.
  The queries on the index are non-trivial, and a little-bit long (might be
  hundreds of terms). No query is identical to another.
 
  Now, I want to analyze the cache performance (before setting up the whole
  environment), in order to estimate how much RAM will I need.
 
  filterCache:
  In my scenariom, every query has some filters. let's say that each filter
  matches 1M documents, out of 10M. Does the estimated memory usage should
 be
  1M * sizeof(uniqueId) * num-of-filters-in-cache?
 
  fieldValueCache:
  Due to the difference between queries, I guess that fieldValueCache is
 the
  most important factor on query performance. Here comes a generic
 question:
  I'm indexing new documents to the index constantly. Soft commits will be
  performed every 10 mins. Does it say that the cache is meaningless, after
  every 10 minutes?
 
  documentCache:
  enableLazyFieldLoading will be enabled, and fl contains a very small
 set
  of fields. BUT, I need to return highlighting on about (possibly) 20
  fields. Does the highlighting component use the documentCache? I guess
 that
  highlighting requires the whole field to be loaded into the
 documentCache.
  Will it happen only for fields that matched a term from the query?
 
  And one more question: I'm planning to hard-commit once a day. Should I
  prepare to a significant RAM usage growth between hard-commits?
 (consider a
  lot of new documents in this period...)
  Does this RAM comes from the same pool as the caches? An OutOfMemory
  exception can happen is this scenario?
 
  Thanks a lot.



Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread oakstream
Thanks for your response!  I appreciate it.  

There will be cases where I want to AND or OR the query between HBASE and
Lucene.  Would it make sense to custom code querying both repositories at
the same time or sequentiallyOr are there any tools out there to do
this?

Basically I'm thinking that HBASE will keep the majority of my data columns
and lucene will keep the index and a unique pointer to the HBASE record. 

Like
HBASE

UID = 12345, COL1, COL2, COL3, COL4, COL5, COL6

LUCENE
ID = 999, UID = 12345 , INDEX Columns (LAT/LON)

My query would be something like where lat/lon in (Polygon) AND COL3 = 'ABC'

Would this kind of setup make sense?  Is there a better way?

I'll be working with Terabytes of data

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034400.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks
The documents are individual products which come from 1 or more vendors.
Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document.
Most fields are multi valued (short_description from each of the 2 vendors,
long_description, product_name, vendor, etc. the same).

I'd like to collapse on the vendor in an attempt to ensure that vast
collections of books, music, and movies, by just a few vendors, don't
overwhelm the results simply due to the fact that they have every search
term imaginable due to the sheer volume of books, CDs, and DVDs, in relation
to other product items.

But in this case there is clearly 1...N vendors per document, solidly a
multi-valued field. And it's hard to put a maximum number of vendors
possible.

Thanks,
Dave


-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Friday, January 18, 2013 2:32 AM
To: solr-user
Subject: Re: Field Collapsing - Anything in the works for multi-valued
fields?

David,

What's the documents and the field? It can help to suggest workaround.


On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com wrote:

 I want to configure Field Collapsing, but my target field is 
 multi-valued (e.g. the field I want to group on has a variable # of 
 entries per document, 1-N entries).

 I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that 
 grouping doesn't support multi-valued fields yet.

 Anything in the works on that front by chance?  Any common work-arounds?





--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com



Need 'stupid beginner' help with SolrCloud

2013-01-17 Thread Shawn Heisey
I'm trying to get a 2-node SolrCloud install off the ground with the 4.1 
branch.  This is a new project for a different system than my existing 
Solr 3.5.0 setup.  It will have one shard and two replicas.


I have part of the example in /opt/mbsolr4 -- jetty, the war file, logs, 
etc.  This is the CWD.


I want all my config and data to live in /index/mbsolr4, so I am using 
-Dsolr.solr.home=/index/mbsolr4.  This setup mirrors what I am doing for 
upgrading the other system from 3.5.0 to 4.1, which is not using SolrCloud.


There is also a separate 3-node zookeeper ensemble, with two of those 
nodes living on the two Solr servers.


What do I need in the solr home (/index/mbsolr4) before I start Solr? 
If I was not using SolrCloud, I would put solr.xml in there, pointing at 
directories relative to that location.


I'm going to have multiple collections.  Some of those collections will 
use the same config/schema, others will use slightly different versions. 
 I have worked out the zkHost value that I will need:


-DzkHost=mbzoo1:2181,mbzoo2:2181,mbzoo3:2181/mbsolr1

I have both Solr servers started and talking to zookeeper, but there are 
no collections so the UI doesn't work.


Are the following options enough for me to get my first config  
collection into zookeeper/solrcloud -- assuming the config is right?  Do 
I need numShards and the replica count at this phase?


-Dbootstrap_confdir=/index/mbsolr4/bootstrapconf
-Dcollection.configName=mbbasecfg

Thanks,
Shawn


Re: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread Otis Gospodnetic
Hi,

Instead of the multi-valued fields, would parent-child setup for you here?

See http://search-lucene.com/?q=solr+joinfc_type=wiki

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 8:04 PM, David Parks davidpark...@yahoo.com wrote:

 The documents are individual products which come from 1 or more vendors.
 Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document.
 Most fields are multi valued (short_description from each of the 2 vendors,
 long_description, product_name, vendor, etc. the same).

 I'd like to collapse on the vendor in an attempt to ensure that vast
 collections of books, music, and movies, by just a few vendors, don't
 overwhelm the results simply due to the fact that they have every search
 term imaginable due to the sheer volume of books, CDs, and DVDs, in
 relation
 to other product items.

 But in this case there is clearly 1...N vendors per document, solidly a
 multi-valued field. And it's hard to put a maximum number of vendors
 possible.

 Thanks,
 Dave


 -Original Message-
 From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
 Sent: Friday, January 18, 2013 2:32 AM
 To: solr-user
 Subject: Re: Field Collapsing - Anything in the works for multi-valued
 fields?

 David,

 What's the documents and the field? It can help to suggest workaround.


 On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com
 wrote:

  I want to configure Field Collapsing, but my target field is
  multi-valued (e.g. the field I want to group on has a variable # of
  entries per document, 1-N entries).
 
  I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
  grouping doesn't support multi-valued fields yet.
 
  Anything in the works on that front by chance?  Any common work-arounds?
 
 
 


 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com




Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread Otis Gospodnetic
You'd want to do your Solr spatial query, get IDs from the index, and then
*after* that do a multi get against your HBase table with top N IDs from
Solr's response and get thus get the data back to the caller.  I don't know
how fast multi gets are, what the limitations are, etc.  Maybe somebody
else can address that.

Alternatively, I suppose you could implement a custom collector that does
gets as matching documents are being collected by Solr.  I don't recall the
class/interface you'd need to implement off the top of my head.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 8:01 PM, oakstream
mike.oa...@oakstreamsystems.comwrote:

 Thanks for your response!  I appreciate it.

 There will be cases where I want to AND or OR the query between HBASE and
 Lucene.  Would it make sense to custom code querying both repositories at
 the same time or sequentiallyOr are there any tools out there to do
 this?

 Basically I'm thinking that HBASE will keep the majority of my data columns
 and lucene will keep the index and a unique pointer to the HBASE record.

 Like
 HBASE

 UID = 12345, COL1, COL2, COL3, COL4, COL5, COL6

 LUCENE
 ID = 999, UID = 12345 , INDEX Columns (LAT/LON)

 My query would be something like where lat/lon in (Polygon) AND COL3 =
 'ABC'

 Would this kind of setup make sense?  Is there a better way?

 I'll be working with Terabytes of data

 Thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034400.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need 'stupid beginner' help with SolrCloud

2013-01-17 Thread Mark Miller
There are a couple ways you can proceed. You can preconfigure some SolrCores in 
solr.xml. Even if you don't, you want a solr.xml, because that is where a lot 
of cloud properties are defined. Or you can use the collections API or the core 
admin API.

I guess I'd recommend the collections API.

You have a couple options for getting in config. I'd recommend using the ZkCli 
tool to upload each of your config sets: 
http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper

After that, use the collections API to create the necessary cores on each node.

Another options is to setup solr.xml like you would locally, then start with 
-Dconf_bootstrap=true and it will duplicate your local config and collection 
setup in ZooKeeper.

- Mark

On Jan 17, 2013, at 9:10 PM, Shawn Heisey s...@elyograg.org wrote:

 I'm trying to get a 2-node SolrCloud install off the ground with the 4.1 
 branch.  This is a new project for a different system than my existing Solr 
 3.5.0 setup.  It will have one shard and two replicas.
 
 I have part of the example in /opt/mbsolr4 -- jetty, the war file, logs, etc. 
  This is the CWD.
 
 I want all my config and data to live in /index/mbsolr4, so I am using 
 -Dsolr.solr.home=/index/mbsolr4.  This setup mirrors what I am doing for 
 upgrading the other system from 3.5.0 to 4.1, which is not using SolrCloud.
 
 There is also a separate 3-node zookeeper ensemble, with two of those nodes 
 living on the two Solr servers.
 
 What do I need in the solr home (/index/mbsolr4) before I start Solr? If I 
 was not using SolrCloud, I would put solr.xml in there, pointing at 
 directories relative to that location.
 
 I'm going to have multiple collections.  Some of those collections will use 
 the same config/schema, others will use slightly different versions.  I have 
 worked out the zkHost value that I will need:
 
 -DzkHost=mbzoo1:2181,mbzoo2:2181,mbzoo3:2181/mbsolr1
 
 I have both Solr servers started and talking to zookeeper, but there are no 
 collections so the UI doesn't work.
 
 Are the following options enough for me to get my first config  collection 
 into zookeeper/solrcloud -- assuming the config is right?  Do I need 
 numShards and the replica count at this phase?
 
 -Dbootstrap_confdir=/index/mbsolr4/bootstrapconf
 -Dcollection.configName=mbbasecfg
 
 Thanks,
 Shawn



build CMIS compatible Solr

2013-01-17 Thread Nicholas Li
hi

I am new to solr and I would like to use Solr as my document server, plus
search engine. But solr is not CMIS compatible( While it shoud not be, as
it is not build as a pure document management server).  In that sense, I
would build another layer beyond Solr so that the exposed interface would
be CMIS compatible.

I did some investigation and looks like OpenCMIS is one of the choices. My
next step would be build this CMIS Bridge layer, which can marshall the
request as CMIS request, then within the CMIS implementation, marshall the
requst as Solr compatible request and send it to Solr. Finally marshall the
Solr response to CMIS compatible response.

Is my logic right?

And, is that any other library other than OpenCMIS to do this job?

cheers.
Nick


Re: Questions about boosting

2013-01-17 Thread Jack Krupansky

Start with Query Elevation and see if that helps:
http://wiki.apache.org/solr/QueryElevationComponent

Index-time document boost is a possibility.

Maybe an ExternalFileField where every document could have a dynamic boost 
value that you add with a boost function.


-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Thursday, January 17, 2013 4:11 PM
To: solr-user@lucene.apache.org
Subject: Questions about boosting

I've been trying to figure this out on my own, but I've come up empty so
far.  I need to boost documents from a certain provider.  The idea is
that if any documents in a result match a separate query (like
provider:bigbucks), I need to multiply the score by X.  It's important
that the result set of the actual query is not changed, just the order.

I've tried a few things from the relevancy page on the wiki but so far I
can't seem to get anything to work.  What syntax should I be using?  Is
it possible to do this at query time?

Thanks,
Shawn 



Re: build CMIS compatible Solr

2013-01-17 Thread Gora Mohanty
On 18 January 2013 10:36, Nicholas Li nicholas...@yarris.com wrote:
 hi

 I am new to solr and I would like to use Solr as my document server, plus
 search engine. But solr is not CMIS compatible( While it shoud not be, as
 it is not build as a pure document management server).  In that sense, I
 would build another layer beyond Solr so that the exposed interface would
 be CMIS compatible.
[...]

May I ask why? Solr is designed to be a search engine,
which is a very different beast from a document repository.
In the open-source world, Alfresco ( http://www.alfresco.com/ )
already exists, can index into Solr, and supports CMIS-based
access.

Regards,
Gora


Re: searching for q terms that start with a dash/hyphen being interpreted as prohibited clauses

2013-01-17 Thread Jack Krupansky

Or put the term in quotes.

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Thursday, January 17, 2013 6:59 PM
To: solr-user@lucene.apache.org
Subject: Re: searching for q terms that start with a dash/hyphen being 
interpreted as prohibited clauses


I think all you need to do is escape the hyphen, or have you tried that 
already?


Best
Erick

On Thu, Jan 17, 2013 at 1:38 PM, geeky2 gee...@hotmail.com wrote:

hello

environment: solr 3.5

problem statement:

i have a requirement to search for part numbers that start with a dash /
hyphen.

example q= term: *-0004A-0436*

example query:

http://some_url:some_port/some_core/select?facet=falsesort=score+desc%2C+rankNo+asc%2C+partCnt+descstart=0q=*-0004A-0436*+itemType%3A1wt=xmlqt=itemModelNoProductTypeBrandSearchrows=4

what is happening: query is returning a huge results set.  in reality 
there

is one (1) and only one record in the database with this part number.

i believe this is happening because the dash is being interpreted by the
query parser as a prohibited clause and the effective result is, give me
everything that does NOT have this part number.

how is this handled so that the search is conducted for the actual part:
-0004A-0436

thx
mark

more information:

request handler in solrconfig.xml

  requestHandler name=itemModelNoProductTypeBrandSearch
class=solr.SearchHandler default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows10/int
  str name=qfitemModelNoExactMatchStr^30 itemModelNo^.9
divProductTypeDesc^.8 plsBrandDesc^.5/str
  str name=q.alt*:*/str
  str name=sortscore desc, rankNo desc, partCnt desc/str
  str name=facettrue/str
  str name=facet.fielditemModelDescFacet/str
  str name=facet.fieldplsBrandDescFacet/str
  str name=facet.fielddivProductTypeIdFacet/str
/lst
lst name=appends
/lst
lst name=invariants
/lst
  /requestHandler


field information from schema.xml (if helpful)

field name=itemModelNoExactMatchStr type=text_general_trim
indexed=true stored=true/

field name=itemModelNo type=text_en_splitting indexed=true
stored=true omitNorms=true/

field name=divProductTypeDesc type=text_general_edge_ngram
indexed=true stored=true multiValued=true/

field name=plsBrandDesc type=text_general_edge_ngram indexed=true
stored=true multiValued=true/


fieldType name=text_general_trim class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/


filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.PatternReplaceFilterFactory pattern=\.
replacement= replace=all/
filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer

fieldType name=text_general_edge_ngram class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory
synonyms=synonyms_SHC.txt ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType






--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-for-q-terms-that-start-with-a-dash-hyphen-being-interpreted-as-prohibited-clauses-tp4034310.html
Sent from the Solr - User mailing list archive at Nabble.com. 




Re: What is the difference in defining multiValued on field and or fieldtype?

2013-01-17 Thread Jack Krupansky
Specifying an attribute on the field type makes it the default for any field 
of that type.


Setting multiValued=true on ignored simply allows it to be used for any 
field, whether it is single or multi-valued, and any source data, whether it 
has one or multiple values for that ignored field. Otherwise, you would get 
an error if multiple values were given for an ignored field which had no 
multiValued attribute, while the stated goal is to simply ignore the field 
and its incoming values.


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Thursday, January 17, 2013 6:20 PM
To: solr-user@lucene.apache.org
Subject: What is the difference in defining multiValued on field and or 
fieldtype?


Hello,

I was looking at the 'ignored' field in the example's schema.xml and
suddenly noticed that its field type has multiValued=true in the
definition. Wiki confirms that it is possible, but does not explains.

What's the difference between defining it on the type and on the field
itself? Because example has it defined on both.

I am confused suddenly, because we now have permutation of 9 different
values (true/false/missing ^ 2) and I am not sure what the exact semantics
is.

I am mostly interested in fieldType/@multiValued=true impact, but curious
about the other permutations.

Thanks,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book) 



Re: Solr cache considerations

2013-01-17 Thread Isaac Hebsh
Unfortunately, it seems (
http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) that
these caches are not per-segment. In this case, I want to (soft) commit
less frequently. Am I right?

Tomás, as the fieldValueCache is very similar to lucene's FieldCache, I
guess it has a big contribution to standard (not only faceted) queries
time. SolrWiki claims that it primarily used by faceting. What that says
about complex textual queries?

documentCache:
Erick, After a query processing is finished, doesn't some documents stay in
the documentCache? can't I use it to accelerate queries that should
retrieve stored fields of documents? In this case, a big documentCache can
hold more documents..

About commit frequency:
HardCommit: openSearch=false seems as a nice solution. Where can I read
about this? (found nothing but one unexplained sentence in SolrWiki).
SoftCommit: In my case, the required index freshness is 10 minutes. The
plan to soft commit every 10 minutes is similar to storing all of the
documents in a queue (outside to Solr), an indexing a bulk every 10 minutes.

Thanks.


On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe 
tomasflo...@gmail.com wrote:

 I think fieldValueCache is not per segment, only fieldCache is. However,
 unless I'm missing something, this cache is only used for faceting on
 multivalued fields


 On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
  cache). Notice the /8. This reflects the fact that the filters are
  represented by a bitset on the _internal_ Lucene ID. UniqueId has no
  bearing here whatsoever. This is, in a nutshell, why warming is
  required, the internal Lucene IDs may change. Note also that it's
  maxDoc, the internal arrays have holes for deleted documents.
 
  Note this is an _upper_ bound, if there are only a few docs that
  match, the size will be (num of matching docs) * sizeof(int)).
 
  fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
  It depends on whether these are per-segment caches or not. Any per
  segment cache is still valid.
 
  Think of documentCache as intended to hold the stored fields while
  various components operate on it, thus avoiding repeatedly fetching
  the data from disk. It's _usually_ not too big a worry.
 
  About hard-commits once a day. That's _extremely_ long. Think instead
  of committing more frequently with openSearcher=false. If nothing
  else, you transaction log will grow lots and lots and lots. I'm
  thinking on the order of 15 minutes, or possibly even much less. With
  softCommits happening more often, maybe every 15 seconds. In fact, I'd
  start out with soft commits every 15 seconds and hard commits
  (openSearcher=false) every 5 minutes. The problem with hard commits
  being once a day is that, if for any reason the server is interrupted,
  on startup Solr will try to replay the entire transaction log to
  assure index integrity. Not to mention that your tlog will be huge.
  Not to mention that there is some memory usage for each document in
  the tlog. Hard commits roll over the tlog, flush the in-memory tlog
  pointers, close index segments, etc.
 
  Best
  Erick
 
  On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh isaac.he...@gmail.com
  wrote:
   Hi,
  
   I am going to build a big Solr (4.0?) index, which holds some dozens of
   millions of documents. Each document has some dozens of fields, and one
  big
   textual field.
   The queries on the index are non-trivial, and a little-bit long (might
 be
   hundreds of terms). No query is identical to another.
  
   Now, I want to analyze the cache performance (before setting up the
 whole
   environment), in order to estimate how much RAM will I need.
  
   filterCache:
   In my scenariom, every query has some filters. let's say that each
 filter
   matches 1M documents, out of 10M. Does the estimated memory usage
 should
  be
   1M * sizeof(uniqueId) * num-of-filters-in-cache?
  
   fieldValueCache:
   Due to the difference between queries, I guess that fieldValueCache is
  the
   most important factor on query performance. Here comes a generic
  question:
   I'm indexing new documents to the index constantly. Soft commits will
 be
   performed every 10 mins. Does it say that the cache is meaningless,
 after
   every 10 minutes?
  
   documentCache:
   enableLazyFieldLoading will be enabled, and fl contains a very small
  set
   of fields. BUT, I need to return highlighting on about (possibly) 20
   fields. Does the highlighting component use the documentCache? I guess
  that
   highlighting requires the whole field to be loaded into the
  documentCache.
   Will it happen only for fields that matched a term from the query?
  
   And one more question: I'm planning to hard-commit once a day. Should I
   prepare to a significant RAM usage growth between hard-commits?
  (consider a
   lot of new documents in this period...)
   Does this RAM 

Re: What is the difference in defining multiValued on field and or fieldtype?

2013-01-17 Thread Alexandre Rafalovitch
Thank you Jack,

I just realized that perhaps ignored was a bad example. But if I understood
correctly, then I can specify multiValued on the type and not do so on the
field itself and I still get multiValued entries.

That's good to know.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 18, 2013 at 12:19 AM, Jack Krupansky j...@basetechnology.comwrote:

 Specifying an attribute on the field type makes it the default for any
 field of that type.

 Setting multiValued=true on ignored simply allows it to be used for any
 field, whether it is single or multi-valued, and any source data, whether
 it has one or multiple values for that ignored field. Otherwise, you would
 get an error if multiple values were given for an ignored field which had
 no multiValued attribute, while the stated goal is to simply ignore the
 field and its incoming values.

 -- Jack Krupansky

 -Original Message- From: Alexandre Rafalovitch
 Sent: Thursday, January 17, 2013 6:20 PM
 To: solr-user@lucene.apache.org
 Subject: What is the difference in defining multiValued on field and or
 fieldtype?


 Hello,

 I was looking at the 'ignored' field in the example's schema.xml and
 suddenly noticed that its field type has multiValued=true in the
 definition. Wiki confirms that it is possible, but does not explains.

 What's the difference between defining it on the type and on the field
 itself? Because example has it defined on both.

 I am confused suddenly, because we now have permutation of 9 different
 values (true/false/missing ^ 2) and I am not sure what the exact semantics
 is.

 I am mostly interested in fieldType/@multiValued=true impact, but curious
 about the other permutations.

 Thanks,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: 
 http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



Re: What is the difference in defining multiValued on field and or fieldtype?

2013-01-17 Thread Jack Krupansky

Yes.

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Friday, January 18, 2013 12:26 AM
To: solr-user@lucene.apache.org
Subject: Re: What is the difference in defining multiValued on field and or 
fieldtype?


Thank you Jack,

I just realized that perhaps ignored was a bad example. But if I understood
correctly, then I can specify multiValued on the type and not do so on the
field itself and I still get multiValued entries.

That's good to know.

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 18, 2013 at 12:19 AM, Jack Krupansky 
j...@basetechnology.comwrote:



Specifying an attribute on the field type makes it the default for any
field of that type.

Setting multiValued=true on ignored simply allows it to be used for any
field, whether it is single or multi-valued, and any source data, whether
it has one or multiple values for that ignored field. Otherwise, you would
get an error if multiple values were given for an ignored field which had
no multiValued attribute, while the stated goal is to simply ignore the
field and its incoming values.

-- Jack Krupansky

-Original Message- From: Alexandre Rafalovitch
Sent: Thursday, January 17, 2013 6:20 PM
To: solr-user@lucene.apache.org
Subject: What is the difference in defining multiValued on field and or
fieldtype?


Hello,

I was looking at the 'ignored' field in the example's schema.xml and
suddenly noticed that its field type has multiValued=true in the
definition. Wiki confirms that it is possible, but does not explains.

What's the difference between defining it on the type and on the field
itself? Because example has it defined on both.

I am confused suddenly, because we now have permutation of 9 different
values (true/false/missing ^ 2) and I am not sure what the exact semantics
is.

I am mostly interested in fieldType/@multiValued=true impact, but curious
about the other permutations.

Thanks,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: 
http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch

- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)





Re: build CMIS compatible Solr

2013-01-17 Thread Nicholas Li
I want to make something like Alfresco, but not having that many features.
And I'd like to utilise the searching ability of Solr.

On Fri, Jan 18, 2013 at 4:11 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 18 January 2013 10:36, Nicholas Li nicholas...@yarris.com wrote:
  hi
 
  I am new to solr and I would like to use Solr as my document server, plus
  search engine. But solr is not CMIS compatible( While it shoud not be, as
  it is not build as a pure document management server).  In that sense, I
  would build another layer beyond Solr so that the exposed interface would
  be CMIS compatible.
 [...]

 May I ask why? Solr is designed to be a search engine,
 which is a very different beast from a document repository.
 In the open-source world, Alfresco ( http://www.alfresco.com/ )
 already exists, can index into Solr, and supports CMIS-based
 access.

 Regards,
 Gora



Re: Is required=true useless in dynamicField?

2013-01-17 Thread Jack Krupansky
Solr will ignore required for dynamic fields. It will be parsed and 
preserved, but will not affect the check for required fields in an input 
document.


Ditto for default value for a dynamic field.

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Friday, January 18, 2013 12:08 AM
To: solr-user@lucene.apache.org
Subject: Is required=true useless in dynamicField?

Hello,

Given the definition:
dynamicField name=addr_* type=email multiValued=true indexed=true
stored=true required=true /

Does it actually matter whether I specify required? I guess there is no way
to have it enforced, right?

Looking at the Wiki, dynamicField does not actually say what parameters it
cares about, so it probably does not even read it from the definition.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book) 



Re: Questions about boosting

2013-01-17 Thread Walter Underwood
Have you tried boost query?  bq=provider:fred

wunder

On Jan 17, 2013, at 9:08 PM, Jack Krupansky wrote:

 Start with Query Elevation and see if that helps:
 http://wiki.apache.org/solr/QueryElevationComponent
 
 Index-time document boost is a possibility.
 
 Maybe an ExternalFileField where every document could have a dynamic boost 
 value that you add with a boost function.
 
 -- Jack Krupansky
 
 -Original Message- From: Shawn Heisey
 Sent: Thursday, January 17, 2013 4:11 PM
 To: solr-user@lucene.apache.org
 Subject: Questions about boosting
 
 I've been trying to figure this out on my own, but I've come up empty so
 far.  I need to boost documents from a certain provider.  The idea is
 that if any documents in a result match a separate query (like
 provider:bigbucks), I need to multiply the score by X.  It's important
 that the result set of the actual query is not changed, just the order.
 
 I've tried a few things from the relevancy page on the wiki but so far I
 can't seem to get anything to work.  What syntax should I be using?  Is
 it possible to do this at query time?
 
 Thanks,
 Shawn 






Re: Suggestion that preserve original phrase case

2013-01-17 Thread Selvam
Thanks again Eric. This time I got it working :). Infact your first
response itself had clear explanation, somehow I did not understand it
completely!


On Thu, Jan 17, 2013 at 6:59 PM, Erick Erickson erickerick...@gmail.comwrote:

 You could write a custom Filter (or perhaps Tokenizer), but I usually
 just do it on the input side before things get sent to Solr.

 I don't think PatternReplaceCharFilterFactory will help, you could
 easily turn the input into original:original, but then you'd need to
 write a custom filter that normalized the left-hand-side but not the
 right-hand-side

 Best
 Erick

 On Tue, Jan 15, 2013 at 11:27 AM, Selvam s.selvams...@gmail.com wrote:
  Thanks Erick, can you tell me how to do the appending
  (lowercaseversion:LowerCaseVersion) before indexing. I tried pattern
  factory filters, but I could not get it right.
 
 
  On Sun, Jan 13, 2013 at 8:49 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  One way I've seen this done is to index pairs like
  lowercaseversion:LowerCaseVersion. You can't push this whole thing
 through
  your field as defined since it'll all be lowercased, you have to produce
  the left hand side of the above yourself and just use KeywordTokenizer
  without LowercaseFilter.
 
  Then, your application displays the right-hand-side of the returned
 token.
 
  Simple solution, not very elegant, but sometimes the easiest...
 
  Best
  Erick
 
 
  On Fri, Jan 11, 2013 at 1:30 AM, Selvam s.selvams...@gmail.com wrote:
 
   Hi*,
  
   *
   I have been trying to figure out a way for case insensitive suggestion
  but
   which should return original phrase as result.* *I am using* *solr
 3.5*
  
   *
   *For eg:
  
   *
   If I index 'Hello world' and search  for 'hello' it needs to return
  *'Hello
   world'* not *'hello world'. *My configurations are as follows,*
   *
   *
   New field type:*
   fieldType class=solr.TextField name=text_auto
 analyzer
  tokenizer class=solr.KeywordTokenizerFactory /
   filter class=solr.LowerCaseFilterFactory/
   /analyzer
  
   *Field values*:
  field name=label type=text indexed=true stored=true
   termVectors=true omitNorms=true/
  field name=label_autocomplete type=text_auto indexed=true
   stored=true multiValued=false/
  copyField source=label dest=label_autocomplete /
  
   *Spellcheck Component*:
 searchComponent name=suggest class=solr.SpellCheckComponent
   str name=queryAnalyzerFieldTypetext_auto/str
   lst name=spellchecker
str name=namesuggest/str
str
  name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
   name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
   str name=buildOnOptimizetrue/str
   str name=buildOnCommittrue/str
   str name=fieldlabel_autocomplete/str
 /lst
   /searchComponent
  
  
   Kindly share your suggestions to implement this behavior.
  
   --
   Regards,
   Selvam
   KnackForge http://knackforge.com
   Acquia Service Partner
   No. 1, 12th Line, K.K. Road, Venkatapuram,
   Ambattur, Chennai,
   Tamil Nadu, India.
   PIN - 600 053.
  
 
 
 
 
  --
  Regards,
  Selvam
  KnackForge http://knackforge.com
  Acquia Service Partner
  No. 1, 12th Line, K.K. Road, Venkatapuram,
  Ambattur, Chennai,
  Tamil Nadu, India.
  PIN - 600 053.




-- 
Regards,
Selvam
KnackForge http://knackforge.com
Acquia Service Partner
No. 1, 12th Line, K.K. Road, Venkatapuram,
Ambattur, Chennai,
Tamil Nadu, India.
PIN - 600 053.


Re: group.ngroups behavior in response

2013-01-17 Thread Amit Nithian
A new response attribute would be better but it also complicates the patch
in that it would require a new way to serialize DocSlices I think
(especially when group.main=true)? I was looking to set group.main=true so
that my existing clients don't have to change to parse the grouped
resultset format.

Secondly, while a new response attribute makes sense the question is
whether or not numFound is the numGroups or numTotal. To me it should be
the number of groups because logically that is what the resultset shows and
the new attribute should point to the number of total.

Thanks
Amit


Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread David Smiley (@MITRE.org)
Hi Oakstream,

Coincidentally I've been thinking of porting the geohash prefixtree
intersection algorithm in Lucene 4 spatial to Accumulo (another big-table
system like HBase).  There's a decent chance it'll happen this year, I
think.  That doesn't help your need right now of course so go with Otis's
advise.

~ David Smiley


oakstream wrote
 Hello,
 I have point data (lat/lon) stored in hbase/hadoop and would like to query
 the data spatially with polygons.  (If I pass in a few polygons find me
 all the records that exist within these polygons.  I need it to support
 polygons not just box queries).  Hadoop doesn't really have much support
 that I could find for these types of queries.  I was wondering if I could
 leverage SOLR spatial 4 and create spatial indexes on the hbase data that
 could be used to query this data?? I need near real-time answers (within a
 couple seconds). 
 
 If anyone has any thoughts on this I would greatly appreciate them.
 
 Thank you





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p403.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questions about boosting

2013-01-17 Thread Shawn Heisey
I did try the bq parameter.  Either I'm not using it correctly, or it's 
not making a noticeable difference.  I was not able to find any good 
docs, either.  Can you give me complete instructions in its use?  Can I 
control the boost factor?  Is the boost additive or multiplicative?


For query elevation, don't you have to know in advance the query that a 
user will send?  There's no way for me to know this - we want to be able 
to apply the boost to arbitrary queries.


The source data comes from MySQL, and this is a seven-shard distributed 
index with 74075200 documents as of a few minutes ago.  Although 
ExternalFileField probably wouldn't be impossible, it is rather impractical.


Thanks,
Shawn

On 1/17/2013 10:53 PM, Walter Underwood wrote:

Have you tried boost query?  bq=provider:fred

wunder

On Jan 17, 2013, at 9:08 PM, Jack Krupansky wrote:


Start with Query Elevation and see if that helps:
http://wiki.apache.org/solr/QueryElevationComponent

Index-time document boost is a possibility.

Maybe an ExternalFileField where every document could have a dynamic boost 
value that you add with a boost function.

-- Jack Krupansky

-Original Message- From: Shawn Heisey
Sent: Thursday, January 17, 2013 4:11 PM
To: solr-user@lucene.apache.org
Subject: Questions about boosting

I've been trying to figure this out on my own, but I've come up empty so
far.  I need to boost documents from a certain provider.  The idea is
that if any documents in a result match a separate query (like
provider:bigbucks), I need to multiply the score by X.  It's important
that the result set of the actual query is not changed, just the order.




Re: Questions about boosting

2013-01-17 Thread Walter Underwood
As I understand it, the bq parameter is a full Lucene query, but only used for 
ranking, not for selection. This is the complement of fq.

You can use weighting:  provider:fred^8

This will be affected by idf, so providers with fewer matches will have higher 
weight than those with more matches. This is a bother, but the idf-free 
approach requires Solr 4.0.

wunder

On Jan 17, 2013, at 10:31 PM, Shawn Heisey wrote:

 I did try the bq parameter.  Either I'm not using it correctly, or it's not 
 making a noticeable difference.  I was not able to find any good docs, 
 either.  Can you give me complete instructions in its use?  Can I control the 
 boost factor?  Is the boost additive or multiplicative?
 
 For query elevation, don't you have to know in advance the query that a user 
 will send?  There's no way for me to know this - we want to be able to apply 
 the boost to arbitrary queries.
 
 The source data comes from MySQL, and this is a seven-shard distributed index 
 with 74075200 documents as of a few minutes ago.  Although ExternalFileField 
 probably wouldn't be impossible, it is rather impractical.
 
 Thanks,
 Shawn
 
 On 1/17/2013 10:53 PM, Walter Underwood wrote:
 Have you tried boost query?  bq=provider:fred
 
 wunder
 
 On Jan 17, 2013, at 9:08 PM, Jack Krupansky wrote:
 
 Start with Query Elevation and see if that helps:
 http://wiki.apache.org/solr/QueryElevationComponent
 
 Index-time document boost is a possibility.
 
 Maybe an ExternalFileField where every document could have a dynamic boost 
 value that you add with a boost function.
 
 -- Jack Krupansky
 
 -Original Message- From: Shawn Heisey
 Sent: Thursday, January 17, 2013 4:11 PM
 To: solr-user@lucene.apache.org
 Subject: Questions about boosting
 
 I've been trying to figure this out on my own, but I've come up empty so
 far.  I need to boost documents from a certain provider.  The idea is
 that if any documents in a result match a separate query (like
 provider:bigbucks), I need to multiply the score by X.  It's important
 that the result set of the actual query is not changed, just the order.
 






Re: Questions about boosting

2013-01-17 Thread Shawn Heisey

On 1/17/2013 11:41 PM, Walter Underwood wrote:

As I understand it, the bq parameter is a full Lucene query, but only used for 
ranking, not for selection. This is the complement of fq.

You can use weighting:  provider:fred^8

This will be affected by idf, so providers with fewer matches will have higher 
weight than those with more matches. This is a bother, but the idf-free 
approach requires Solr 4.0.


I am doing my testing on Solr 4.1, so if you can give me the syntax for 
that, I would appreciate it.  My production indexes are 3.5, but once we 
are confident with the 4.1 dev system, we'll upgrade.


The provider field has omitTermFreqAndPositions=true defined, but the 
fields that typically get searched don't omit anything, so IDF probably 
still applies in the aggregate.


On a related note, I have rather extreme length variation in my fields, 
so I see quite a lot of weird results due to very short metadata.  Is 
there any way to lessen the impact of lengthNorm without eliminating it 
entirely?  If not, is there any way to eliminate lengthNorm without also 
disabling index-time boosts?  At this moment I am not doing index-time 
boosting, but business requirements may change that in the future.


Thanks,
Shawn



Re: Questions about boosting

2013-01-17 Thread Shawn Heisey

On 1/17/2013 11:41 PM, Walter Underwood wrote:

As I understand it, the bq parameter is a full Lucene query, but only used for 
ranking, not for selection. This is the complement of fq.

You can use weighting:  provider:fred^8


I tried bq=ip:sc^1000 and it doesn't seem to be making any difference. 
Even if I add fq=ip:sc, I don't see any mention of bq, ip, sc, or 1000 
in the debugQuery output.


This is the case on both 3.5 and 4.1.  In case it was caused by omitting 
termfreq and positions on the field I'm using in the bq, I tried a 
couple of other fields that don't omit anything and bq seems to be 
having no effect at all.


Thanks,
Shawn



Re: Large data importing getting rollback with solr

2013-01-17 Thread Gora Mohanty
On 18 January 2013 12:49, ashimbose ashimb...@gmail.com wrote:
 Hi Otis,

 Thank you for your reply.

 But I am unable to get any search result related to the error code. Its not
 response for more than 168 Data Source. I have tested it. If you have any
 other solution please let me know.

Not sure about the limit on 168 data sources in
DIH, but I am curious as to why you need that
many? Do you have that many different mysql
databases that you are indexing from?

Regards,
Gora


Re: Questions about boosting

2013-01-17 Thread Mikhail Khludnev
Colleagues,
fwiw bq is a DisMax parser feature. Shawn, to approach the boosting syntax
with the standard parser you need something like q=foo:bar ip:sc^1000.
Specifying ^1000 in bq makes no sense ever. If you show query params and
debugQuery output, it would much easier for us to help you.
PS omitting termfreq's and positions doesn't impact query time boosing
ever. The closes caveat is that disabling norms indexing kills _index_ time
boosting.


On Fri, Jan 18, 2013 at 11:10 AM, Shawn Heisey s...@elyograg.org wrote:

 On 1/17/2013 11:41 PM, Walter Underwood wrote:

 As I understand it, the bq parameter is a full Lucene query, but only
 used for ranking, not for selection. This is the complement of fq.

 You can use weighting:  provider:fred^8


 I tried bq=ip:sc^1000 and it doesn't seem to be making any difference.
 Even if I add fq=ip:sc, I don't see any mention of bq, ip, sc, or 1000 in
 the debugQuery output.

 This is the case on both 3.5 and 4.1.  In case it was caused by omitting
 termfreq and positions on the field I'm using in the bq, I tried a couple
 of other fields that don't omit anything and bq seems to be having no
 effect at all.

 Thanks,
 Shawn




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Large data importing getting rollback with solr

2013-01-17 Thread ashimbose
Hi Gora ,

Thank you for your quick reply.

I have only one data source, But have more than 300 tables. Each tables I
have put in individual entity in data-confic.xml

But when I am trying to do full import, Its showing Thant much entry as
str name=Total Requests made to DataSource169/str

This 169 means I took 169 tables from my data source and each 169 tables
created individual entity in my 
data-confic.xml file.

I am not sure, if I did something wrong. Please let me know.

My sample data-config.xml I am posting as below..

?xml version=1.0 encoding=utf-8?
dataConfig
  dataSource type=JdbcDataSource name=sampleDB
driver=com.ibm.optim.connect.jdbc.NvDriver
url=jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDB
user= password=/
  document name=headwords
entity name=CUSTOMER dataSource=sampleDB query=SELECT * FROM
CUSTOMER transformer=RegexTransformer
  field column=ID name=ID/
  field column=ADDRESS name=ADDRESS/
  field column=SIGNON_TYPE name=SIGNON_TYPE/
  field column=NAME name=NAME/
/entity
.
.
.
.
  /document
/dataConfig

Thank you

Regards,
Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Large-data-importing-getting-rollback-with-solr-tp4034075p4034466.html
Sent from the Solr - User mailing list archive at Nabble.com.