Re: Problem with solr deployment on weblogic 10.3

2013-04-25 Thread Shawn Heisey
On 4/24/2013 10:47 PM, Radhakrishna Repala wrote:
 I'm new to solr, while deploying solr in weblogic got following exception.
 Please help me in this regard.
 
 Error 500--Internal Server Error
 
 java.lang.NoSuchMethodError: replaceEach

The replaceEach method is included in Apache commons-lang 2.4 and later.
 The solr.war file contains a jar for this library, version 2.6 for the
latest Solr versions.

From what I have been able to determine using Google, weblogic 10.3 uses
an older version of the Apache commons-lang library, and that is taking
precedence, so the version included in Solr is not being used.

It looks like the solution is adding some config to the weblogic.xml
file in the solr.war so that weblogic prefers application classes.  I
filed SOLR-4762.  I do not know if this change might have unintended
consequences.

http://ananthkannan.blogspot.com/2009/08/beware-of-stringutilscontainsignorecase.html

https://issues.apache.org/jira/browse/SOLR-4762

Thanks,
Shawn



Re: Update on shards

2013-04-25 Thread Arkadi Colson

Hi

It seems not to work in my case. We are using the solr php module for 
talking to Solr. Currently we have 2 collections 'intradesk' and 'lvs' 
for 10 solr hosts (shards: 5 - repl: 2). Because there is no more disc 
space I created 6 new hosts for collection 'messages' (shards: 3 - repl: 2).


'intradesk + lvs':
solr01-dcg
solr01-gs
solr02-dcg
solr02-gs
solr03-dcg
solr03-gs
solr04-dcg
solr04-gs
solr05-dcg
solr05-gs

'messages':
solr06-dcg
solr06-gs
solr07-dcg
solr07-gs
solr08-dcg
solr08-gs

So when doing a select, I can talk to any host. When updating I must 
talk to a host with at least 1 shard on it.


I created the new messages shard with the following command to get them 
on the new hosts (06 - 08): 
http://solr01-dcg.intnet.smartbit.be:8983/solr/admin/collections?action=CREATEname=messagesnumShards=3replicationFactor=2collection.configName=smsccreateNodeSet=solr06-gs.intnet.smartbit.be:8983_solr,solr06-dcg.intnet.smartbit.be:8983_solr,solr07-gs.intnet.smartbit.be:8983_solr,solr07-dcg.intnet.smartbit.be:8983_solr,solr08-gs.intnet.smartbit.be:8983_solr,solr08-dcg.intnet.smartbit.be:8983_solr 



They are all in the same config set 'smsc'.

Below is the code:

$client = new SolrClient(
array(
'hostname'  = solr01-dcg.intnet.smartbit.be 
http://solr01-dcg.intnet.smartbit.be:8983/solr/admin/collections?action=CREATEname=messagesnumShards=3replicationFactor=2collection.configName=smsccreateNodeSet=solr06-gs.intnet.smartbit.be:8983_solr,solr06-dcg.intnet.smartbit.be:8983_solr,solr07-gs.intnet.smartbit.be:8983_solr,solr07-dcg.intnet.smartbit.be:8983_solr,solr08-gs.intnet.smartbit.be:8983_solr,solr08-dcg.intnet.smartbit.be:8983_solr,

'port'  = 8983,
'login' = ***,
'password'  = ***,
'path'  = solr/messages,
'wt'= json
)
);

$doc = new 
SolrInputDocument();

$doc-addField('id', $uniqueID);
$doc-addField('smsc_ssid', $ssID);
$doc-addField('smsc_module',   $i['module']);
$doc-addField('smsc_modulekey', $i['moduleKey']);
$doc-addField('smsc_courseid', $courseID);
$doc-addField('smsc_description', $i['description']);
$doc-addField('smsc_content',  $i['content']);
$doc-addField('smsc_lastdate', $lastdate);
$doc-addField('smsc_userid',   $userID);

$client-addDocument($doc);

The exception I get look like this:
exception 'SolrClientException' with message 'Unsuccessful update 
request. Response Code 200. (null)'


Nothing special to find in the solr log.

Any idea?


Arkadi

On 04/24/2013 08:43 PM, Mark Miller wrote:

Sorry - need to correct myself - updates worked the same as read requests - 
they also needed to hit a SolrCore in order to get forwarded to the right node. 
I was not thinking clearly when I said this applied to just reads and not 
writes. Both needed a SolrCore to do their work - with the request proxying, 
this is no longer the case, so you can hit Solr instances with no SolrCores or 
with SolrCores that are not part of the collection you are working with, and 
both read and write side requests are now proxied to a suitable node that has a 
SolrCore that can do the search or forward the update (or accept the update).

- Mark

On Apr 23, 2013, at 3:38 PM, Mark Miller markrmil...@gmail.com wrote:


We have a 3rd release candidate for 4.3 being voted on now.

I have never tested this feature with Tomcat - only Jetty. Users have reported 
it does not work with Tomcat. That leads one to think it may have a problem in 
other containers as well.

A previous contributor donated a patch that explicitly flushes a stream in our 
proxy code - he says this allows the feature to work with Tomcat. I committed 
this feature - the flush can't hurt, and given the previous contributions of 
this individual, I'm fairly confident the fix makes things work in Tomcat. I 
have no first hand knowledge that it does work though.

You might take the RC for a spin and test it our yourself: 
http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/

- Mark

On Apr 23, 2013, at 3:20 PM, Furkan KAMACI furkankam...@gmail.com wrote:


Hi Mark;

All in all you say that when 4.3 is tagged at repository (I mean when it is
ready) this feature will work for Tomcat too at a stable version?


2013/4/23 Mark Miller markrmil...@gmail.com


On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote:


What exactly is the 'request proxying' thing that doesn't work on

tomcat?  Is this something different from basic SolrCloud operation where
you send any kind of request to any server and they get directed where they
need to go? I haven't heard of that not working on tomcat before.

Before 4.2, if you made a read request to a node that didn't contain part
of the collection you where 

Re: Using Solr For a Real Search Engine

2013-04-25 Thread Furkan KAMACI
Hi Otis;

You are right. start.jar starts up an Jetty and there is a war file under
example directory and deploys start.jar to itself, is that true?

2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com

 Suggestion :
 Don't call this embedded Jetty to avoid confusion with the actual embedded
 jetty.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Apr 23, 2013 4:56 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  Thanks for the answers. I will go with embedded Jetty for my SolrCloud.
 If
  I face with something important I would want to share my experiences with
  you.
 
  2013/4/23 Shawn Heisey s...@elyograg.org
 
   On 4/23/2013 2:25 PM, Furkan KAMACI wrote:
  
   Is there any documentation that explains using Jetty as embedded or
  not? I
   use Solr deployed at Tomcat but after you message I will consider
 about
   Jetty. If we think about other issues i.e. when I want to update my
 Solr
   jars/wars etc.(this is just an foo example) does any pros and cons
  Tomcat
   or Jetty has?
  
  
   The Jetty in the example is only 'embedded' in the sense that you don't
   have to install it separately.  It is not special -- the Jetty
 components
   are not changed at all, a subset of them is just included in the Solr
   download with a tuned configuration file.
  
   If you go to www.eclipse.org/jetty and download the latest stable-8
   version, you'll see some familiar things - start.jar, an etc
 directory, a
   lib directory, and a contexts directory.  They have more in them than
 the
   example does -- extra functionality Solr doesn't need.  If you want to
   start the downloaded version, you can use 'java -jar start.jar' just
 like
   you do with Solr.
  
   Thanks,
   Shawn
  
  
 



Re: filter before facet

2013-04-25 Thread Toke Eskildsen
On Wed, 2013-04-24 at 23:10 +0200, Daniel Tyreus wrote:
 But why is it slow to generate facets on a result set of 0? Furthermore,
 why does it take the same amount of time to generate facets on a result set
 of 2000 as 100,000 documents?

The default faceting method for your query is field cache. Field cache
faceting works by generating a structure for all the values for the
field in the whole corpus. It is exactly the same work whether you hit
0, 2K or 100M documents with your query.

After the structure has been build, the actual counting of values in the
facet is fast. There is not much difference between 2K and 100K hits.

 This leads me to believe that the FQ is being applied AFTER the facets are
 calculated on the whole data set. For my use case it would make a ton of
 sense to apply the FQ first and then facet. Is it possible to specify this
 behavior or do I need to get into the code and get my hands dirty?

As you write later, you have tried fc, enum and fcs, with fcs having the
fastest first-request-time time. That is understandable as it is
segment-oriented and (nearly) just a matter of loading the values
sequentially from storage. However, the general observation is that it
is about 10 times as slow as the fc-method for subsequent queries. Since
you are doing NRT that might still leave fcs as the best method for you.

As for creating a new faceting implementation that avoids the startup
penalty by using only the found documents, then it is technically quite
simple: Use stored fields, iterate the hits and request the values.
Unfortunately this scales poorly with the number of hits, so unless you
can guarantee that you will always have small result sets, this is
probably not a viable option.

- Toke Eskildsen, State and University Library, Denmark



Re: JVM Parameters to Startup Solr?

2013-04-25 Thread Toke Eskildsen
On Wed, 2013-04-24 at 18:03 +0200, Mark Miller wrote:
 On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:
 
  -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError
 
 The way I like to handle this is to have the OOM trigger a little script or 
 set of cmds that logs the issue and kills the process.

We treat all Errors as fatal by writing to a dedicated log and shutting
down the JVM (which triggers the load balancer etc.). Unfortunately that
means that some XML + XSLT combinations can bring the JVM down due to
StackOverflowError. This might be a little too diligent as the Oracle
JVM running on Linux (our current setup) is resilient to Threads hitting
stack overflow.

- Toke Eskildsen, State and University Library, Denmark



Re: solr.StopFilterFactory doesn't work with wildcard

2013-04-25 Thread Dmitry Baranov
1) I use StopFilterFactory in multiterm analyzer because without it query
analizer doesn't work with multi-terms, in particular terms with wildcard.
2) I expect that:
str name=rawquerystringsearch_string_ss_i:(hp* pavilion* series*
d4*)/str
str name=querystringsearch_string_ss_i:(hp* pavilion* series* d4*)/str
str name=parsedquerysearch_string_ss_i:hp* +search_string_ss_i:pavilion*
+search_string_ss_i:d4*/str
str name=parsedquery_toString+search_string_ss_i:hp*
+search_string_ss_i:pavilion* +search_string_ss_i:d4*/str
i.e. I expect that StopFilterFactory will work likewise query without
wildcard



Thanks for you answer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581p4058856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JVM Parameters to Startup Solr?

2013-04-25 Thread Furkan KAMACI
Could you explain that what you mean with such kind of scripts? What it
checks and do exactly?


2013/4/25 Toke Eskildsen t...@statsbiblioteket.dk

 On Wed, 2013-04-24 at 18:03 +0200, Mark Miller wrote:
  On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:
 
   -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError
 
  The way I like to handle this is to have the OOM trigger a little script
 or set of cmds that logs the issue and kills the process.

 We treat all Errors as fatal by writing to a dedicated log and shutting
 down the JVM (which triggers the load balancer etc.). Unfortunately that
 means that some XML + XSLT combinations can bring the JVM down due to
 StackOverflowError. This might be a little too diligent as the Oracle
 JVM running on Linux (our current setup) is resilient to Threads hitting
 stack overflow.

 - Toke Eskildsen, State and University Library, Denmark




Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Alan Woodward
Hi Walter, Dmitry,

I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with some 
work-in-progress.  Have a look!

Alan Woodward
www.flax.co.uk


On 23 Apr 2013, at 07:40, Dmitry Kan wrote:

 Hello Walter,
 
 Have you had a chance to get something working with graphite, codahale and
 solr?
 
 Has anyone else tried these tools with Solr 3.x family? How much work is it
 to set things up?
 
 We have tried zabbix in the past. Even though it required lots of up front
 investment on configuration, it looks like a compelling option.
 In the meantime, we are looking into something more solr-tailed yet
 simple. Even without metrics persistence. Tried: jconsole and viewing stats
 via jmx. Main point for us now is to gather the RAM usage.
 
 Dmitry
 
 
 On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wun...@wunderwood.orgwrote:
 
 If it isn't obvious, I'm glad to help test a patch for this. We can run a
 simulated production load in dev and report to our metrics server.
 
 wunder
 
 On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
 
 That approach sounds great. --wunder
 
 On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
 
 I've been thinking about how to improve this reporting, especially now
 that metrics-3 (which removes all of the funky thread issues we ran into
 last time I tried to add it to Solr) is close to release.  I think we could
 go about it as follows:
 
 * refactor the existing JMX reporting to use metrics-3.  This would
 mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
 adding a JmxReporter, keeping the existing config logic to determine which
 JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
 the metrics-3 data back into SolrMBean format to keep the reporting
 backwards-compatible.  This seems like a lot of work for no visible
 benefit, but…
 * we can then add the ability to define other metrics reporters in
 solrconfig.xml.  There are already reporters for Ganglia and Graphite - you
 just add then to the Solr lib/ directory, configure them in solrconfig, and
 voila - Solr can be monitored using the same devops tools you use to
 monitor everything else.
 
 Does this sound sane?
 
 Alan Woodward
 www.flax.co.uk
 
 
 On 6 Apr 2013, at 20:49, Walter Underwood wrote:
 
 Wow, that really doesn't help at all, since these seem to only be
 reported in the stats page.
 
 I don't need another non-standard app-specific set of metrics,
 especially one that needs polling. I need metrics delivered to the common
 system that we use for all our servers.
 
 This is also why SPM is not useful for us, sorry Otis.
 
 Also, there is no time period on these stats. How do you graph the
 95th percentile? I know there was a lot of work on these, but they seem
 really useless to me. I'm picky about metrics, working at Netflix does that
 to you.
 
 wunder
 
 On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
 
 In the Jira, but not in the docs.
 
 It would be nice to have VM stats like GC, too, so we can have common
 monitoring and alerting on all our services.
 
 wunder
 
 On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
 
 It's there! :)
 http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue
 
 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/
 
 On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood 
 wun...@wunderwood.org wrote:
 That sounds great. I'll check out the bug, I didn't see anything in
 the docs about this. And if I can't find it with a search engine, it
 probably isn't there.  --wunder
 
 On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
 
 On 3/29/2013 12:07 PM, Walter Underwood wrote:
 What are folks using for this?
 
 I don't know that this really answers your question, but Solr 4.1
 and
 later includes a big chunk of codahale metrics internally for
 request
 handler statistics - see SOLR-1972.  First we tried including the
 jar
 and using the API, but that created thread leak problems, so the
 source
 code was added.
 
 Thanks,
 Shawn
 
 
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 



Re: Solr 3.6.1: changing a field from stored to not stored

2013-04-25 Thread Majirus FANSI
Good to know I missed something about solr replication.
Thanks Jan


On 24 April 2013 17:42, Jan Høydahl jan@cominvent.com wrote:

  I would create a new core as slave of the existing configuration without
  replicating the core schema and configuration. This way I can get the

 This won't work, as master/slave replication copies the index files as-is.

 You should re-index all your data. You don't need to take down the cluster
 to do that, just re-index on top of what's there already, and your index
 will become smaller and smaller as merging kicks out the old data :)

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 24. apr. 2013 kl. 15:59 skrev Majirus FANSI majirus@gmail.com:

  I would create a new core as slave of the existing configuration without
  replicating the core schema and configuration. This way I can get the
  information from one index to the other while saving the space as fields
 in
  the new schema are mainly not stored. After the replication I would swap
  the cores for the online core to point to the right index dir and conf.
  i.e. the one with less stored fields.
 
  Maj
 
 
  On 24 April 2013 01:48, Petersen, Robert
  robert.peter...@mail.rakuten.comwrote:
 
  Hey I just want to verify one thing before I start doing this:  function
  queries only require fields to be indexed but don't require them to be
  stored right?
 
  -Original Message-
  From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com]
  Sent: Tuesday, April 23, 2013 4:39 PM
  To: solr-user@lucene.apache.org
  Subject: RE: Solr 3.6.1: changing a field from stored to not stored
 
  Good info, Thanks Hoss!  I was going to add a more specific fl=
 parameter
  to my queries at the same time.  Currently I am doing fl=*,score so that
  will have to be changed.
 
 
  -Original Message-
  From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
  Sent: Tuesday, April 23, 2013 4:18 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr 3.6.1: changing a field from stored to not stored
 
 
  : index?  I noticed I am unnecessarily storing some fields in my index
 and
  : I'd like to stop storing them without having to 'reindex the world'
 and
  : let the changes just naturally percolate into my index as updates come
  : in the normal course of things.  Do you guys think I could get away
 with
  : this?
 
  Yes, you can easily get away with this type of change w/o re-indexing,
  however you won't gain any immediate index size savings until each and
  every existing doc has been reindexed and the old copies expunged from
 the
  index via segment merges.
 
  the one hicup thta can affect people when doing this is what happens if
  you use something like fl=* (and likely hl=* as well) ... many
 places
  in Solr will try to avoid failure if a stored field is found in the
 index
  which isn't defined in the schema, and treat that stored value as a
 string
  (legacy behavior designed to make it easier for people to point Solr at
 old
  lucene indexes built w/o using Solr) ... so if these stored values are
 not
  strings, you might get some weird data in your response for these
 documents.
 
 
  -Hoss
 
 
 
 
 




Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Dmitry Kan
Hi Alan,

Great! What is the solr version you are patching?

Speaking of graphite, we have set it up recently to monitor our shard farm.
So far since the RAM usage has been most important metric we were fine with
pidstat command and a little script generating stats for carbon.
Having some additional stats from SOLR itself would certainly be great to
have.

Dmitry

On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward a...@flax.co.uk wrote:

 Hi Walter, Dmitry,

 I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with
 some work-in-progress.  Have a look!

 Alan Woodward
 www.flax.co.uk


 On 23 Apr 2013, at 07:40, Dmitry Kan wrote:

  Hello Walter,
 
  Have you had a chance to get something working with graphite, codahale
 and
  solr?
 
  Has anyone else tried these tools with Solr 3.x family? How much work is
 it
  to set things up?
 
  We have tried zabbix in the past. Even though it required lots of up
 front
  investment on configuration, it looks like a compelling option.
  In the meantime, we are looking into something more solr-tailed yet
  simple. Even without metrics persistence. Tried: jconsole and viewing
 stats
  via jmx. Main point for us now is to gather the RAM usage.
 
  Dmitry
 
 
  On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wun...@wunderwood.org
 wrote:
 
  If it isn't obvious, I'm glad to help test a patch for this. We can run
 a
  simulated production load in dev and report to our metrics server.
 
  wunder
 
  On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
 
  That approach sounds great. --wunder
 
  On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
 
  I've been thinking about how to improve this reporting, especially now
  that metrics-3 (which removes all of the funky thread issues we ran into
  last time I tried to add it to Solr) is close to release.  I think we
 could
  go about it as follows:
 
  * refactor the existing JMX reporting to use metrics-3.  This would
  mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
  adding a JmxReporter, keeping the existing config logic to determine
 which
  JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
  the metrics-3 data back into SolrMBean format to keep the reporting
  backwards-compatible.  This seems like a lot of work for no visible
  benefit, but…
  * we can then add the ability to define other metrics reporters in
  solrconfig.xml.  There are already reporters for Ganglia and Graphite -
 you
  just add then to the Solr lib/ directory, configure them in solrconfig,
 and
  voila - Solr can be monitored using the same devops tools you use to
  monitor everything else.
 
  Does this sound sane?
 
  Alan Woodward
  www.flax.co.uk
 
 
  On 6 Apr 2013, at 20:49, Walter Underwood wrote:
 
  Wow, that really doesn't help at all, since these seem to only be
  reported in the stats page.
 
  I don't need another non-standard app-specific set of metrics,
  especially one that needs polling. I need metrics delivered to the
 common
  system that we use for all our servers.
 
  This is also why SPM is not useful for us, sorry Otis.
 
  Also, there is no time period on these stats. How do you graph the
  95th percentile? I know there was a lot of work on these, but they seem
  really useless to me. I'm picky about metrics, working at Netflix does
 that
  to you.
 
  wunder
 
  On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
 
  In the Jira, but not in the docs.
 
  It would be nice to have VM stats like GC, too, so we can have
 common
  monitoring and alerting on all our services.
 
  wunder
 
  On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
 
  It's there! :)
 
 http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
  On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood 
  wun...@wunderwood.org wrote:
  That sounds great. I'll check out the bug, I didn't see anything
 in
  the docs about this. And if I can't find it with a search engine, it
  probably isn't there.  --wunder
 
  On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
 
  On 3/29/2013 12:07 PM, Walter Underwood wrote:
  What are folks using for this?
 
  I don't know that this really answers your question, but Solr 4.1
  and
  later includes a big chunk of codahale metrics internally for
  request
  handler statistics - see SOLR-1972.  First we tried including the
  jar
  and using the API, but that created thread leak problems, so the
  source
  code was added.
 
  Thanks,
  Shawn
 
 
 
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 




Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
Hi,
 is it possible to get exact matched result if the search term is combined
e.g. cats AND London NOT Leeds


In the previus threads I have read that it is possible to create new field
of String type and perform phrase search on it but nowhere the above
mentioned combined search term had been taken into consideration.

BR
Pawel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Alan Woodward
This is on top of trunk at the moment, but would be back ported to 4.4 if there 
was interest.

Alan Woodward
www.flax.co.uk


On 25 Apr 2013, at 10:32, Dmitry Kan wrote:

 Hi Alan,
 
 Great! What is the solr version you are patching?
 
 Speaking of graphite, we have set it up recently to monitor our shard farm.
 So far since the RAM usage has been most important metric we were fine with
 pidstat command and a little script generating stats for carbon.
 Having some additional stats from SOLR itself would certainly be great to
 have.
 
 Dmitry
 
 On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward a...@flax.co.uk wrote:
 
 Hi Walter, Dmitry,
 
 I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with
 some work-in-progress.  Have a look!
 
 Alan Woodward
 www.flax.co.uk
 
 
 On 23 Apr 2013, at 07:40, Dmitry Kan wrote:
 
 Hello Walter,
 
 Have you had a chance to get something working with graphite, codahale
 and
 solr?
 
 Has anyone else tried these tools with Solr 3.x family? How much work is
 it
 to set things up?
 
 We have tried zabbix in the past. Even though it required lots of up
 front
 investment on configuration, it looks like a compelling option.
 In the meantime, we are looking into something more solr-tailed yet
 simple. Even without metrics persistence. Tried: jconsole and viewing
 stats
 via jmx. Main point for us now is to gather the RAM usage.
 
 Dmitry
 
 
 On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wun...@wunderwood.org
 wrote:
 
 If it isn't obvious, I'm glad to help test a patch for this. We can run
 a
 simulated production load in dev and report to our metrics server.
 
 wunder
 
 On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
 
 That approach sounds great. --wunder
 
 On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
 
 I've been thinking about how to improve this reporting, especially now
 that metrics-3 (which removes all of the funky thread issues we ran into
 last time I tried to add it to Solr) is close to release.  I think we
 could
 go about it as follows:
 
 * refactor the existing JMX reporting to use metrics-3.  This would
 mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
 adding a JmxReporter, keeping the existing config logic to determine
 which
 JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
 the metrics-3 data back into SolrMBean format to keep the reporting
 backwards-compatible.  This seems like a lot of work for no visible
 benefit, but…
 * we can then add the ability to define other metrics reporters in
 solrconfig.xml.  There are already reporters for Ganglia and Graphite -
 you
 just add then to the Solr lib/ directory, configure them in solrconfig,
 and
 voila - Solr can be monitored using the same devops tools you use to
 monitor everything else.
 
 Does this sound sane?
 
 Alan Woodward
 www.flax.co.uk
 
 
 On 6 Apr 2013, at 20:49, Walter Underwood wrote:
 
 Wow, that really doesn't help at all, since these seem to only be
 reported in the stats page.
 
 I don't need another non-standard app-specific set of metrics,
 especially one that needs polling. I need metrics delivered to the
 common
 system that we use for all our servers.
 
 This is also why SPM is not useful for us, sorry Otis.
 
 Also, there is no time period on these stats. How do you graph the
 95th percentile? I know there was a lot of work on these, but they seem
 really useless to me. I'm picky about metrics, working at Netflix does
 that
 to you.
 
 wunder
 
 On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
 
 In the Jira, but not in the docs.
 
 It would be nice to have VM stats like GC, too, so we can have
 common
 monitoring and alerting on all our services.
 
 wunder
 
 On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
 
 It's there! :)
 
 http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue
 
 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/
 
 On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood 
 wun...@wunderwood.org wrote:
 That sounds great. I'll check out the bug, I didn't see anything
 in
 the docs about this. And if I can't find it with a search engine, it
 probably isn't there.  --wunder
 
 On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
 
 On 3/29/2013 12:07 PM, Walter Underwood wrote:
 What are folks using for this?
 
 I don't know that this really answers your question, but Solr 4.1
 and
 later includes a big chunk of codahale metrics internally for
 request
 handler statistics - see SOLR-1972.  First we tried including the
 jar
 and using the API, but that created thread leak problems, so the
 source
 code was added.
 
 Thanks,
 Shawn
 
 
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 
 
 



Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry
Hi Pawel,

Not sure which parser you are using, I am using edismax and tried using the
bq parameter to boost the results having exact matches at the top.
You may try something like:
q=cats AND London NOT Leedsbq=cats^50

In edismax, pf and pf2 parameters also need some tuning to get the results
at the top.

HTH,
Sandeep


On 25 April 2013 10:33, vsl ociepa.pa...@gmail.com wrote:

 Hi,
  is it possible to get exact matched result if the search term is combined
 e.g. cats AND London NOT Leeds


 In the previus threads I have read that it is possible to create new field
 of String type and perform phrase search on it but nowhere the above
 mentioned combined search term had been taken into consideration.

 BR
 Pawel



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1

2013-04-25 Thread Shahar Davidson
Hi all,

I'm trying to run 'ant idea' on 4.2.* and I'm getting invalid sha1 error 
messages. (see below)

I'll appreciate any help,

Shahar
===
.
.
.
resolve
ivy:retrieve

:: problems summary ::
 WARNINGS
problem while downloading module descriptor: 
http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (72ms)
problem while downloading module descriptor: 
http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom:
 invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 
(53ms)
problem while downloading module descriptor: 
http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: 
expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms)
problem while downloading module descriptor: 
http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: 
invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 
(58ms)

module not found: org.apache.ant#ant;1.8.2
.
.
.
 public: tried
  http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
 sonatype-releases: tried
  
http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
 maven.restlet.org: tried
  http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
 working-chinese-mirror: tried
  
http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
problem while downloading module descriptor: 
http://repo1.maven.org/maven2/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (74ms)
problem while downloading module descriptor: 
http://oss.sonatype.org/content/repositories/releases/junit/junit/4.10/junit-4.10.pom:
 invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 
(60ms)
problem while downloading module descriptor: 
http://maven.restlet.org/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms)
problem while downloading module descriptor: 
http://mirror.netcologne.de/maven2/junit/junit/4.10/junit-4.10.pom: invalid 
sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms)

module not found: junit#junit;4.10
.
.
.
.
 ::
::  UNRESOLVED DEPENDENCIES ::
::
:: org.apache.ant#ant;1.8.2: not found
:: junit#junit;4.10: not found
:: com.carrotsearch.randomizedtesting#junit4-ant;2.0.8: not 
found
:: 
com.carrotsearch.randomizedtesting#randomizedtesting-runner;2.0.8: not found
::

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
D:\apache_solr_4.2.1\lucene\common-build.xml:348: impossible to resolve 
dependencies:
resolve failed - see output for details


Re: Solr faceted search UI

2013-04-25 Thread Majirus FANSI
Hi Rocha,
In your webapp I guess you have at list a view and a service layers.
The indexing and search modules should preferably be hosted at the service
layer.
I recommend you read the Api doc (
http://lucene.apache.org/solr/4_2_1/solr-solrj/index.html) to get a sense
of what you can do with SolrJ.
Followinf is a Basic Example Facets with SolrJ:
//adding the query keyword to the SolrQuery object
 mySolrQuery.setQuery(queryBuilder.toString());
// add a facet field
mySolrQuery.addFacetField(myFieldName)
//add a facet query
validatedFromTheLast7DaysFacetQuery = validationDateField + :[NOW/DAY-7DAY
TO NOW]
mySolrQuery.addFacetQuery(validatedFromTheLast7DaysFacetQuery)

//send the request in HTTP POST as with HTTP GET you run into issues when
the request string is too long.
QueryResponse queryResponse =  getSolrHttpServer().query(mysolrQuery,
METHOD.POST);

//write a transformer to convert the Solr response to a format
understandable by the caller (the client of the search service)
//List of results to transform
SolrDocumentList responseSolrDocumentList = queryResponse.getResults();
//get the facet fields, interate over the list, parse each FacetField and
extract the information you are intersted in
queryResponse.getFacetFields()
//get the facet query from the response
 MapString, Integer mapOfFacetQueries = queryResponse.getFacetQuery();
The keys of this map are your facet queries. The values are the counts you
display to the user. In general, I have an identifier for each facetQuery.
When I parse the keys of this map of facet queries, I return the identifier
of each facet along with its count (if the count  0 of course). The caller
is aware of this identifier so it knows what to display to the user.

When the user clicks on a facet, you send it as a search criteria along
with the initial keywords to the search service. The criteria resulting
from the facet is treated as a filter query. That is how faceting search
works. Adding a filter to your query is as simple as this snippet
mySolrQuery.addFilterQuery(myfilterQuery). should you are filtering because
your user click on the previously defined facet query, then the filter
query is the same as the facet query. that is myfilterQuery =
validationDateField + :[NOW/DAY-7DAY TO NOW].

I hope this helps.

Cheers,

Maj


On 24 April 2013 17:27, richa striketheg...@gmail.com wrote:

 Hi Maj,

 Thanks for your suggestion.
 Tell me one thing, do you have any example on solrj? suppose I decide to
 use solrj in simple web application, to display faceted search on web page.
 Where will this fit into? what will be the flow?

 Please suggest.

 Thanks


 On Wed, Apr 24, 2013 at 11:01 AM, Majirus FANSI [via Lucene] 
 ml-node+s472066n4058610...@n3.nabble.com wrote:

  Hi richa,
  You can use solrJ (
  http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr)
  to query your solr index.
  On the wiki page indicated, you will see example of faceted search using
  solrJ.
  2009 article by Yonik available on
  searchhubhttp://searchhub.org/2009/09/02/faceted-search-with-solr/
  is
  a good tutorial on faceted search.
  Whether you go for MVC framework or not is up to you. It is recommend
  tough
  to develop search engine application in a Service Oriented Architecture.
  Regards,
 
  Maj
 
 
  On 24 April 2013 16:43, richa [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4058610i=0
  wrote:
 
   Hi,
   I am working on a POC, where I have to display faceted search result on
  web
   page. can anybody please help me to suggest what all set up I need to
   configure to display. I would prefer java technologies. Just to
 mention,
  I
   have solr cloud running on remote server.
   I would like to know:
   1. Should I use MVC framework?
   2. How will my local interact with remote solr server?
   3. How will I send query through java code and what technology I should
  use
   to display faceted search result?
  
   Please help me on this.
  
   Thanks,
  
  
  
   --
   View this message in context:
  
 http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
 
  --
   If you reply to this email, your message will be added to the discussion
  below:
 
 
 http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058610.html
   To unsubscribe from Solr faceted search UI, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4058598code=c3RyaWtldGhlZ29hbEBnbWFpbC5jb218NDA1ODU5OHwxNzIzOTAyMzYx
 
  .
  NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 




 --
 View this message in context:
 

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
Thanks for your reply. I am using edismax as well. What I want to get is the
exact match without other results that could be close to the given term.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Jack Krupansky
As indicated previously, yes, exact matching is possible in Solr. You, the 
developer, have full control over the exactness or inexactness of all 
queries. If any query is inexact in some way, it is solely due to decisions 
that you, the developer, have made.


Generally speaking, inexactness, fuzziness if you will, is the precise 
quality that most developers - and users - are looking for in search. I 
mean, generally, having to be precise and exact in search requests... is 
tedious and a real drag, and something to be avoided - in general.


But, that's what string fields, the white space tokenizer, the regular 
expression tokenizer, and full developer control of the token filter 
sequence are for - to let you, the developer, to have full control, 
including all aspects of exactness of search.


As to your specific question - there is nothing about the AND, OR, or 
NOT (or + or -) operators that is in any way anything other than 
exact, in terms of document matching. OR can be considered a form of 
inexactness in that presence of a term is optional, but AND means 
absolutely MUST, and NOT means absolutely MUST_NOT. About as exact as 
anything could get.


Scoring and relevancy are another story, but have nothing to do with 
matching or exactness. Exactness and matching only affect whether a 
document is counted in numFound and included in results or not, not the 
ordering of results.


But why are you asking? Is there some problem you are trying to solve? Is 
there some query that is not giving you the results you expect? If this is 
simply a general information question, fine, answered. But if you are trying 
to solve some problem, you will need to clearly state your problem rather 
than asking some general, abstract question.


-- Jack Krupansky

-Original Message- 
From: vsl

Sent: Thursday, April 25, 2013 5:33 AM
To: solr-user@lucene.apache.org
Subject: Exact matching in Solr 3.6.1

Hi,
is it possible to get exact matched result if the search term is combined
e.g. cats AND London NOT Leeds


In the previus threads I have read that it is possible to create new field
of String type and perform phrase search on it but nowhere the above
mentioned combined search term had been taken into consideration.

BR
Pawel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry
I think in that case, making a field String type is your option, however
remember that it'd be case sensitive.
Another approach is to create a case insensitive field type and doing
searches on those fields only.

fieldType name=string_ci class=solr.TextField sortMissingLast=true
omitNorms=true compressThreshold=10
   analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

Can you provide your fields and dismax config and if possible records you
would like and records you do not want?

-S


On 25 April 2013 11:50, vsl ociepa.pa...@gmail.com wrote:

 Thanks for your reply. I am using edismax as well. What I want to get is
 the
 exact match without other results that could be close to the given term.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
I will explain my case in the example below:

We have three documents with given content:

First document:
london cats glenvilet

Second document
london cat glenvilet leeds

Third document
london cat glenvilet 

Search term: cats AND London NOT Leeds 

Expected result: First document
Current result: First document, Third document

Additionaly, next requirement says that when I type as search term: cats
AND Londo NOT Leeds 
then I should get spell check collation: cats AND London NOT Leeds 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058890.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Jack Krupansky
It sounds as if your field type is doing stemming - mapping cats to cat. 
That is a valuable feature of search, but if you wish to turn it off... go 
ahead and do so by editing the field type. But just be aware that turning 
off stemming is a great loss of search flexibility.


Who knows, maybe you might want to have both stemmed and unstemmed fields in 
an edismax query and give a higher boost to the unstemmed field - but it's 
not up to us to guess your requirements. We're dependent on you clearly 
expressing your requirements.


As indicated before, you, the developer have complete control here. But... 
it is up to you, the developer to choose wisely, to suit your application 
requirements. But if you don't describe your requirements with greater 
precision and detail, we won't be able to be of much help to you.


Your second (only two) requirement relates to spellcheck, which is 
completely unrelated to query matching and exactness. Yes, Solr has a 
spellcheck capability, and yes, it does collation. Is that all you are 
asking? If there is a specific issue, please be specific about it.


-- Jack Krupansky

-Original Message- 
From: vsl

Sent: Thursday, April 25, 2013 8:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching in Solr 3.6.1

I will explain my case in the example below:

We have three documents with given content:

First document:
london cats glenvilet

Second document
london cat glenvilet leeds

Third document
london cat glenvilet

Search term: cats AND London NOT Leeds

Expected result: First document
Current result: First document, Third document

Additionaly, next requirement says that when I type as search term: cats
AND Londo NOT Leeds
then I should get spell check collation: cats AND London NOT Leeds




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058890.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Dmitry Kan
We are very much interested in 3.4.


On Thu, Apr 25, 2013 at 12:55 PM, Alan Woodward a...@flax.co.uk wrote:

 This is on top of trunk at the moment, but would be back ported to 4.4 if
 there was interest.

 Alan Woodward
 www.flax.co.uk


 On 25 Apr 2013, at 10:32, Dmitry Kan wrote:

  Hi Alan,
 
  Great! What is the solr version you are patching?
 
  Speaking of graphite, we have set it up recently to monitor our shard
 farm.
  So far since the RAM usage has been most important metric we were fine
 with
  pidstat command and a little script generating stats for carbon.
  Having some additional stats from SOLR itself would certainly be great to
  have.
 
  Dmitry
 
  On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward a...@flax.co.uk wrote:
 
  Hi Walter, Dmitry,
 
  I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with
  some work-in-progress.  Have a look!
 
  Alan Woodward
  www.flax.co.uk
 
 
  On 23 Apr 2013, at 07:40, Dmitry Kan wrote:
 
  Hello Walter,
 
  Have you had a chance to get something working with graphite, codahale
  and
  solr?
 
  Has anyone else tried these tools with Solr 3.x family? How much work
 is
  it
  to set things up?
 
  We have tried zabbix in the past. Even though it required lots of up
  front
  investment on configuration, it looks like a compelling option.
  In the meantime, we are looking into something more solr-tailed yet
  simple. Even without metrics persistence. Tried: jconsole and viewing
  stats
  via jmx. Main point for us now is to gather the RAM usage.
 
  Dmitry
 
 
  On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
 
  If it isn't obvious, I'm glad to help test a patch for this. We can
 run
  a
  simulated production load in dev and report to our metrics server.
 
  wunder
 
  On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
 
  That approach sounds great. --wunder
 
  On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
 
  I've been thinking about how to improve this reporting, especially
 now
  that metrics-3 (which removes all of the funky thread issues we ran
 into
  last time I tried to add it to Solr) is close to release.  I think we
  could
  go about it as follows:
 
  * refactor the existing JMX reporting to use metrics-3.  This would
  mean replacing the SolrCore.infoRegistry map with a MetricsRegistry,
 and
  adding a JmxReporter, keeping the existing config logic to determine
  which
  JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler
 translate
  the metrics-3 data back into SolrMBean format to keep the reporting
  backwards-compatible.  This seems like a lot of work for no visible
  benefit, but…
  * we can then add the ability to define other metrics reporters in
  solrconfig.xml.  There are already reporters for Ganglia and Graphite
 -
  you
  just add then to the Solr lib/ directory, configure them in
 solrconfig,
  and
  voila - Solr can be monitored using the same devops tools you use to
  monitor everything else.
 
  Does this sound sane?
 
  Alan Woodward
  www.flax.co.uk
 
 
  On 6 Apr 2013, at 20:49, Walter Underwood wrote:
 
  Wow, that really doesn't help at all, since these seem to only be
  reported in the stats page.
 
  I don't need another non-standard app-specific set of metrics,
  especially one that needs polling. I need metrics delivered to the
  common
  system that we use for all our servers.
 
  This is also why SPM is not useful for us, sorry Otis.
 
  Also, there is no time period on these stats. How do you graph the
  95th percentile? I know there was a lot of work on these, but they
 seem
  really useless to me. I'm picky about metrics, working at Netflix does
  that
  to you.
 
  wunder
 
  On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
 
  In the Jira, but not in the docs.
 
  It would be nice to have VM stats like GC, too, so we can have
  common
  monitoring and alerting on all our services.
 
  wunder
 
  On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
 
  It's there! :)
 
  http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
  On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood 
  wun...@wunderwood.org wrote:
  That sounds great. I'll check out the bug, I didn't see anything
  in
  the docs about this. And if I can't find it with a search engine, it
  probably isn't there.  --wunder
 
  On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
 
  On 3/29/2013 12:07 PM, Walter Underwood wrote:
  What are folks using for this?
 
  I don't know that this really answers your question, but Solr
 4.1
  and
  later includes a big chunk of codahale metrics internally for
  request
  handler statistics - see SOLR-1972.  First we tried including
 the
  jar
  and using the API, but that created thread leak problems, so
 the
  source
  code was added.
 
  Thanks,
  Shawn
 
 
 
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
  --
  Walter Underwood
  

Re: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-25 Thread Dmitry Kan
Are there any distrib facet gurus on the list? I would be ready to try
sensible ideas, including on the source code level, if someone of you could
give me a hand.

Dmitry


On Wed, Apr 24, 2013 at 3:08 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello list,

 We deal with an anomaly when doing a distributed facet query against 102
 shards.

 The problem manifests itself in both the frontend solr (router) and a
 shard. Each time the request is executed, always different shard is
 affected (at random, hence the anomaly).

 The query is: http://router_host:router_port
 /solr/select?q=testfacet=truefacet.field=field_of_type_longfacet.limit=1330facet.mincount=1rows=1facet.sort=indexfacet.zeros=falsefacet.offset=0
 I have omitted the shards parameter.

 The router log:

 request: http://10.155.244.181:9150/solr/select
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 at 
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
 at 
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)

 Notice the port of a shard, that is affected. That port changes all the
 time, even for the same request
 The log entry is prepended with lines:

 SEVERE: org.apache.solr.common.SolrException: Internal Server Error

 Internal Server Error

 (they are not in the pastebin link)

 The shard log:

 Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NullPointerException
 at java.io.StringReader.init(StringReader.java:50)
 at 
 org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
 at 
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
 at org.apache.solr.search.QParser.getQuery(QParser.java:142)
 at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
 at java.lang.Thread.run(Thread.java:722)

 Apr 24, 2013 11:08:49 AM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select params={} status=500 QTime=2
 Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NullPointerException
 at java.io.StringReader.init(StringReader.java:50)
 at 
 org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
 at 
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
 at org.apache.solr.search.QParser.getQuery(QParser.java:142)
 at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
 at 
 

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
Exact matching is just one of my cases.  Currently I perform search on field
with given definition:

fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1 preserveOriginal=1 types=characters.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory
language=English/
  /analyzer
  analyzer type=query
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1 preserveOriginal=1 types=characters.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory
language=English/

  /analyzer
/fieldType

This field definition fullfils all other requirments.
Examples:
- special characters
- passengers- passenger
 
The case with exact matching is the last one I have to complete.

The problem with cats - cat is caused by SnowballPorterFilterFactory. This
is what I know.

The question is whether it is possible to handle exact matching (edismax)
with only one result like described in the previous post without influencing
existing functionalities?

BR 
Pawel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr maven install - authorization problem when downloading maven.restlet.org dependencies

2013-04-25 Thread Shahar Davidson
Hi,

I'm trying to build Solr 4.2.x with Maven and I'm getting the following error 
in solr-core:

[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1.341s
[INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013
[INFO] Final Memory: 12M/174M
[INFO] 
[ERROR] Failed to execute goal on project solr-core: Could not resolve 
dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT: Failed 
to collect dependencies for [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT 
(compile), org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile), 
commons-codec:commons-codec:jar:1.7 (compile), commons-cli:commons-cli:jar:1.2 
(compile), commons-fileupload:commons-fileupload:jar:1.2.1 (compile), 
org.restlet.jee:org.restlet:jar:2.1.1 (compile), 
org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile), 
org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile), org.slf4j:slf4j-jdk14:jar:1.6.4 
(compile), commons-io:commons-io:jar:2.1 (compile), 
commons-lang:commons-lang:jar:2.6 (compile), com.google.guava:guava:jar:13.0.1 
(compile), org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?), 
org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?), 
org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?), 
org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime), 
javax.servlet:servlet-api:jar:2.4 (provided), 
org.apache.httpcomponents:httpclient:jar:4.2.3 (compile), 
org.apache.httpcomponents:httpmime:jar:4.2.3 (compile), 
org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]: Failed 
to read artifact descriptor for org.restlet.jee:org.restlet:jar:2.1.1: Could 
not transfer artifact org.restlet.jee:org.restlet:pom:2.1.1 from/to 
maven-restlet (http://maven.restlet.org): Not authorized, 
ReasonPhrase:Unauthorized. - [Help 1]


Has anyone encountered this issue?

Thanks,

Shahar.


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Majirus FANSI
Hi Pawel,
If you are searching on any field of type text_general as defined in your
schema, you are stuck with the porter stemmer. In fact in your setting solr
is not aware of a term like cats, but cat. Thus no way to do exact
match  with cats in this case.
What you can do is creating a new type of field and with the copyField
facility save a verbatim version of your data in that field while the field
of type text-general still performs stemming. Finally, do add the new
field to the list of searcheable field with a higher boost so that exact
match receives highest score.
Hope this helps.
regards,

Maj


On 25 April 2013 14:43, vsl ociepa.pa...@gmail.com wrote:

 Exact matching is just one of my cases.  Currently I perform search on
 field
 with given definition:

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1
   catenateWords=1 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 preserveOriginal=1 types=characters.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/

 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English/
   /analyzer
   analyzer type=query
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1
   catenateWords=1 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 preserveOriginal=1 types=characters.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English/

   /analyzer
 /fieldType

 This field definition fullfils all other requirments.
 Examples:
 - special characters
 - passengers- passenger

 The case with exact matching is the last one I have to complete.

 The problem with cats - cat is caused by SnowballPorterFilterFactory.
 This
 is what I know.

 The question is whether it is possible to handle exact matching (edismax)
 with only one result like described in the previous post without
 influencing
 existing functionalities?

 BR
 Pawel



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058907.html
 Sent from the Solr - User mailing list archive at Nabble.com.



FieldCache insanity with field used as facet and group

2013-04-25 Thread Elodie Sannier

Hello,

I am using the Lucene FieldCache with SolrCloud and I have insane instances 
with messages like:

VALUEMISMATCH: Multiple distinct value objects for 
SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'='merchantid',class 
org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'='merchantid',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'='merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713

All insane instances are for a field merchantid of type int used as facet 
and group field.

I'm using a custom SearchHandler which makes two sub-queries, a first query 
with group.field=merchantid and a second query with facet.field=merchantid.

When I'm using the parameter facet.method=enum, I don't have the insane 
instance but I'm not sure it is the good fix.

This insanity can have performance impact ?
How can I fix it ?

Elodie Sannier


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Solr maven install - authorization problem when downloading maven.restlet.org dependencies

2013-04-25 Thread Dmitry Kan
Building the solr 4.2.1 worked fine for me. Here is the relevant portion of
ivy-settings.xml that I had to change:

chain name=default returnFirst=true checkmodified=true
changingPattern=.*SNAPSHOT
  resolver ref=local/
  !-- resolver ref=local-maven-2 / --
  resolver ref=main/
  !-- resolver ref=sonatype-releases / --  !-- COMMENTED OUT --
  resolver ref=maven.restlet.org /
  resolver ref=working-chinese-mirror /
/chain

Dmitry


On Thu, Apr 25, 2013 at 3:53 PM, Shahar Davidson shah...@checkpoint.comwrote:

 Hi,

 I'm trying to build Solr 4.2.x with Maven and I'm getting the following
 error in solr-core:

 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 1.341s
 [INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013
 [INFO] Final Memory: 12M/174M
 [INFO]
 
 [ERROR] Failed to execute goal on project solr-core: Could not resolve
 dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT:
 Failed to collect dependencies for
 [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile),
 org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile),
 commons-codec:commons-codec:jar:1.7 (compile),
 commons-cli:commons-cli:jar:1.2 (compile),
 commons-fileupload:commons-fileupload:jar:1.2.1 (compile),
 org.restlet.jee:org.restlet:jar:2.1.1 (compile),
 org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile),
 org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile),
 org.slf4j:slf4j-jdk14:jar:1.6.4 (compile), commons-io:commons-io:jar:2.1
 (compile), commons-lang:commons-lang:jar:2.6 (compile),
 com.google.guava:guava:jar:13.0.1 (compile),
 org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?),
 org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?),
 org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?),
 org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime),
 javax.servlet:servlet-api:jar:2.4 (provided),
 org.apache.httpcomponents:httpclient:jar:4.2.3 (compile),
 org.apache.httpcomponents:httpmime:jar:4.2.3 (compile),
 org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]:
 Failed to read artifact descriptor for
 org.restlet.jee:org.restlet:jar:2.1.1: Could not transfer artifact
 org.restlet.jee:org.restlet:pom:2.1.1 from/to maven-restlet (
 http://maven.restlet.org): Not authorized, ReasonPhrase:Unauthorized. -
 [Help 1]


 Has anyone encountered this issue?

 Thanks,

 Shahar.



Re: Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1

2013-04-25 Thread Steve Rowe
Hi Shahar,

I suspect you may have an older version of Ivy installed - the errors you're 
seeing look like IVY-1194 https://issues.apache.org/jira/browse/IVY-1194, 
which was fixed in Ivy 2.2.0.  Lucene/Solr uses Ivy 2.3.0.  Take a look in 
C:\Users\account\.ant\lib\ and remove older versions of ivy-*.jar, then run 
'ant ivy-bootstrap' from the Solr source code to download ivy-2.3.0.jar to 
C:\Users\account\.ant\lib\.

Just now on a Windows 7 box, I downloaded solr-4.2.1-src.tgz from one of the 
Apache mirrors, unpacked it, deleted my C:\Users\account\.ivy2\ directory (so 
that ivy would re-download everything), and ran 'ant idea' from a cmd window.  
BUILD SUCCESSFUL.

Steve

On Apr 25, 2013, at 6:07 AM, Shahar Davidson shah...@checkpoint.com wrote:

 Hi all,
 
 I'm trying to run 'ant idea' on 4.2.* and I'm getting invalid sha1 error 
 messages. (see below)
 
 I'll appreciate any help,
 
 Shahar
 ===
 .
 .
 .
 resolve
 ivy:retrieve
 
 :: problems summary ::
  WARNINGS
   problem while downloading module descriptor: 
 http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
 sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (72ms)
   problem while downloading module descriptor: 
 http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom:
  invalid sha1: expected=!-- 
 computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms)
   problem while downloading module descriptor: 
 http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
 sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms)
   problem while downloading module descriptor: 
 http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: 
 invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 
 (58ms)
 
   module not found: org.apache.ant#ant;1.8.2
 .
 .
 .
    public: tried
 http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
    sonatype-releases: tried
 
 http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
    maven.restlet.org: tried
 http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
    working-chinese-mirror: tried
 
 http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
   problem while downloading module descriptor: 
 http://repo1.maven.org/maven2/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
 expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (74ms)
   problem while downloading module descriptor: 
 http://oss.sonatype.org/content/repositories/releases/junit/junit/4.10/junit-4.10.pom:
  invalid sha1: expected=!-- 
 computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms)
   problem while downloading module descriptor: 
 http://maven.restlet.org/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
 expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms)
   problem while downloading module descriptor: 
 http://mirror.netcologne.de/maven2/junit/junit/4.10/junit-4.10.pom: invalid 
 sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms)
 
   module not found: junit#junit;4.10
 .
 .
 .
 .
::
   ::  UNRESOLVED DEPENDENCIES ::
   ::
   :: org.apache.ant#ant;1.8.2: not found
   :: junit#junit;4.10: not found
   :: com.carrotsearch.randomizedtesting#junit4-ant;2.0.8: not 
 found
   :: 
 com.carrotsearch.randomizedtesting#randomizedtesting-runner;2.0.8: not found
   ::
 
 :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 D:\apache_solr_4.2.1\lucene\common-build.xml:348: impossible to resolve 
 dependencies:
   resolve failed - see output for details



Re: Did something change with Payloads?

2013-04-25 Thread hariistou
Hi Jim,

I faced almost the same issue with payloads recently, and thought I would
rather write about it.
Please see the link below (my blog). I hope it helps.

http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/
http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/  

Additionally, like what Mark Miller has said, using Solr 4.x, you have to
add documents one by one during indexing, to reflect payload scores
correctly. Like say..

solr.addBean(doc);
solr.commit();

When you try to add documents as a collection through addBeans() there is
only one .PAY file created and all documents are scored as per the payload
score of the first document to be indexed.

There is surely some problem with Lucene 4.1 codec APIs. So for now the
above solution would work.
Probably, I need to write a sequel to my first article regarding the above
point on indexing. :)

Thanks,
Hari.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4058919.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr maven install - authorization problem when downloading maven.restlet.org dependencies

2013-04-25 Thread Steve Rowe
Hi Shahar,

On a Windows 7 box, after downloading solr-4.2.1-src.tgz from one of the Apache 
mirrors and unpacking it, I did the following from a cmd window:

PROMPT cd solr-4.2.1
PROMPT ant get-maven-poms
PROMPT cd maven-build
PROMPT mvn install

Is the above what you did?

After a while, I see: 

-
[INFO] 
[INFO] Building Apache Solr Core
[INFO]task-segment: [install]
[INFO] 
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet/2.1.1/org.restlet-2.1.1.pom
614b downloaded  (org.restlet-2.1.1.pom)
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet.parent/2.1.1/org.restlet.parent-2.1.1.pom
7K downloaded  (org.restlet.parent-2.1.1.pom)
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet.ext.servlet/2.1.1/org.restlet.ext.servlet-2.1.1.pom
907b downloaded  (org.restlet.ext.servlet-2.1.1.pom)
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet/2.1.1/org.restlet-2.1.1.jar
[…]
709K downloaded  (org.restlet-2.1.1.jar)
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet.ext.servlet/2.1.1/org.restlet.ext.servlet-2.1.1.jar
19K downloaded  (org.restlet.ext.servlet-2.1.1.jar)
-

It's possible that the Restlet maven repository was temporarily malfunctioning. 
 Have you tried again?

Steve

On Apr 25, 2013, at 8:53 AM, Shahar Davidson shah...@checkpoint.com wrote:

 Hi,
 
 I'm trying to build Solr 4.2.x with Maven and I'm getting the following error 
 in solr-core:
 
 [INFO] 
 
 [INFO] BUILD FAILURE
 [INFO] 
 
 [INFO] Total time: 1.341s
 [INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013
 [INFO] Final Memory: 12M/174M
 [INFO] 
 
 [ERROR] Failed to execute goal on project solr-core: Could not resolve 
 dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT: Failed 
 to collect dependencies for [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT 
 (compile), org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile), 
 org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile), 
 commons-codec:commons-codec:jar:1.7 (compile), 
 commons-cli:commons-cli:jar:1.2 (compile), 
 commons-fileupload:commons-fileupload:jar:1.2.1 (compile), 
 org.restlet.jee:org.restlet:jar:2.1.1 (compile), 
 org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile), 
 org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile), org.slf4j:slf4j-jdk14:jar:1.6.4 
 (compile), commons-io:commons-io:jar:2.1 (compile), 
 commons-lang:commons-lang:jar:2.6 (compile), 
 com.google.guava:guava:jar:13.0.1 (compile), 
 org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?), 
 org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?), 
 org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?), 
 org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime), 
 javax.servlet:servlet-api:jar:2.4 (provided), 
 org.apache.httpcomponents:httpclient:jar:4.2.3 (compile), 
 org.apache.httpcomponents:httpmime:jar:4.2.3 (compile), 
 org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]: Failed 
 to read artifact descriptor for org.restlet.jee:org.restlet:jar:2.1.1: Could 
 not transfer artifact org.restlet.jee:org.restlet:pom:2.1.1 from/to 
 maven-restlet (http://maven.restlet.org): Not authorized, 
 ReasonPhrase:Unauthorized. - [Help 1]
 
 
 Has anyone encountered this issue?
 
 Thanks,
 
 Shahar.



Re: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-25 Thread Yonik Seeley
On Thu, Apr 25, 2013 at 8:32 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Are there any distrib facet gurus on the list? I would be ready to try
 sensible ideas, including on the source code level, if someone of you could
 give me a hand.

The Lucene/Solr Revolution conference is coming up next week, so I
think many are busy creating their presentations.
What version of Solr are you using?  Have you tried using a newer
version?  Is it reproducable with a smaller cluster?  If so, you could
try using the included Jetty server instead of Tomcat to rule out that
factor.

-Yonik
http://lucidworks.com


Re: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-25 Thread Dmitry Kan
Thanks, Yonik. Yes, I supposed that. We are in the pre-release phase, so we
have the pressure.

Solr 3.4.

Would setting up 4.2.1 router work with 3.4 shards?
On 25 Apr 2013 17:11, Yonik Seeley yo...@lucidworks.com wrote:

 On Thu, Apr 25, 2013 at 8:32 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Are there any distrib facet gurus on the list? I would be ready to try
  sensible ideas, including on the source code level, if someone of you
 could
  give me a hand.

 The Lucene/Solr Revolution conference is coming up next week, so I
 think many are busy creating their presentations.
 What version of Solr are you using?  Have you tried using a newer
 version?  Is it reproducable with a smaller cluster?  If so, you could
 try using the included Jetty server instead of Tomcat to rule out that
 factor.

 -Yonik
 http://lucidworks.com



What is the difference between a Join Query and Embedded Entities in Solr DIH?

2013-04-25 Thread Gustav
Hello guys, i saw this thread on stackoverflow, but still not satisfied with
the answers. 

I am trying to index data across multiple tables using Solr's Data Import
Handler. The official wiki on the DIH suggests using embedded entities to
link multiple tables like so:

document
entity name=item pk=id query=SELECT * FROM item
entity name=member pk=memberid query=SELECT * FROM member
WHERE memberid='${item.memberid}'
/entity
/entity
/document

Another way that works is:

document
entity name=item pk=id query=SELECT * FROM item INNER JOIN member
ON item.memberid=member.memberid
/entity
/document

Are these two methods functionally different? Is there a performance
difference?

Another though would be that, if using join tables in MySQL, using the SQL
query method with multiple joins could cause multiple documents to be
indexed instead of one.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question on storage and index/data management in solr

2013-04-25 Thread Vinay Rai
Hi,
I am relatively new to solr and evaluating it for my project.

I would have lots of data coming in at a fast rate (say 10 MB per sec) and I 
would need the recent data (last 24 hours, or last 100GB) to be searchable 
faster than the old data. I did a bit of reading on the controls provided by 
solr and came across the concept of mergeFactor (defaults to 10) - this means 
solr merges every 10 segments into one.

However, I need something like this -

1. Keep each of last 24 hours segments separate.
2. Segments generated between last 48 to 24 hours to be merged into one. 
Similarly, for segments created between 72 to 48 hours and so on for last 1 
week.
3. Similarly, merge previous 4 week's data into one segment each week.
4. Merge all previous months data into one segment each month.

I am not sure if there is a configuration possible in solr application. If not, 
are there APIs which will allow me to do this?

Also, I want to understand how solr stores data or does it have a dependency on 
the way data is stored. Since the volumes are high, it would be great if the 
data is compressed and stored (while still searchable). If it is possible, I 
would like to know what kind of compression does solr do?

Thank you for the responses.
 
Regards,
Vinay

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
Thanks for your reply but this solution does not fullfil my requirment
because other documents (not exact matched) will be returned as well.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is the difference between a Join Query and Embedded Entities in Solr DIH?

2013-04-25 Thread Alexandre Rafalovitch
I think JOIN is more performant as - by default - DIH will run an
inner query for each outer one. You can use cached source, but JOIN
will be still more efficient.

The nested entities are more useful when the sources are heterogeneous
(e.g. DB and XML) or when you need to do custom transformers in
between.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Apr 25, 2013 at 10:17 AM, Gustav xbihy...@sharklasers.com wrote:
 Hello guys, i saw this thread on stackoverflow, but still not satisfied with
 the answers.

 I am trying to index data across multiple tables using Solr's Data Import
 Handler. The official wiki on the DIH suggests using embedded entities to
 link multiple tables like so:

 document
 entity name=item pk=id query=SELECT * FROM item
 entity name=member pk=memberid query=SELECT * FROM member
 WHERE memberid='${item.memberid}'
 /entity
 /entity
 /document

 Another way that works is:

 document
 entity name=item pk=id query=SELECT * FROM item INNER JOIN member
 ON item.memberid=member.memberid
 /entity
 /document

 Are these two methods functionally different? Is there a performance
 difference?

 Another though would be that, if using join tables in MySQL, using the SQL
 query method with multiple joins could cause multiple documents to be
 indexed instead of one.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: What is the difference between a Join Query and Embedded Entities in Solr DIH?

2013-04-25 Thread Dyer, James
Gustav,

DIH should give you the same results in both scenarios.  The performance 
trade-offs depend on your data.  In your case, it looks like there is a 1-to-1 
or many-to-1 relationship between item and member, so use the SQL Join.  
You'll get all of your data in one query and you'll be using your rbdms for 
what it does best.

But in the case there was a 1-to-many relationship between item and member, 
and especially if each item has several member rows, you might get better 
performance using the child entity setup.  Although by default DIH is going to 
do an n+1 select on member.  For every row in item, it will issue a separate 
query to the db.  Also, DIH does not use prepared statements, so this might be 
a bad choice.  

To work around this, specify cacheImpl='SortedMapBackedCache' on the child 
entity (this is the same as using CachedSqlEntityProcessor instead of 
SqlEntityProcessor).  Do not include a where clause in this child entity.  
Instead, specify cacheKey='memberId' and cacheLookup='item.memberId'.  DIH 
will now pull down your entire member table in 1 query and cache it in 
memory, then it can do fast hash joins against item.

But if your member table is too big to fit into memory, then you need to use 
a disk-backed cache instead of SortedMapBackedCache.  For that, see 
https://issues.apache.org/jira/browse/SOLR-2948 and 
https://issues.apache.org/jira/browse/SOLR-2613 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gustav [mailto:xbihy...@sharklasers.com] 
Sent: Thursday, April 25, 2013 9:17 AM
To: solr-user@lucene.apache.org
Subject: What is the difference between a Join Query and Embedded Entities in 
Solr DIH?

Hello guys, i saw this thread on stackoverflow, but still not satisfied with
the answers. 

I am trying to index data across multiple tables using Solr's Data Import
Handler. The official wiki on the DIH suggests using embedded entities to
link multiple tables like so:

document
entity name=item pk=id query=SELECT * FROM item
entity name=member pk=memberid query=SELECT * FROM member
WHERE memberid='${item.memberid}'
/entity
/entity
/document

Another way that works is:

document
entity name=item pk=id query=SELECT * FROM item INNER JOIN member
ON item.memberid=member.memberid
/entity
/document

Are these two methods functionally different? Is there a performance
difference?

Another though would be that, if using join tables in MySQL, using the SQL
query method with multiple joins could cause multiple documents to be
indexed instead of one.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html
Sent from the Solr - User mailing list archive at Nabble.com.




SolrJ Custom RowMapper

2013-04-25 Thread Luis Lebolo
Hi All,

Does SolrJ have an option for a custom RowMapper or BeanPropertyRowMapper
(I'm using Spring/JDBC terms).

I know the QueryResponse has a getBeans method, but I would like to create
my own mapping and plug it in.

Any pointers?

Thanks,
Luis


Using another way instead of DIH

2013-04-25 Thread xiaoqi
hi,all

i using DIH to build index is slow , when it fetch 2 million rows , it will
spend 20 minutes , very slow. 

i am not very familar with  solr , try to   using lucene direct building
index file from db then move to solr folder.

i am not sure ,that is right way. or any other good way? 

thanks a lot .


 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr For a Real Search Engine

2013-04-25 Thread Otis Gospodnetic
Hi,

No, start.jar is not deployed.  That *is* Jetty.
This is what the real Embedded Jetty is about:
http://wiki.eclipse.org/Jetty/Tutorial/Embedding_Jetty

What we have here is Solr is just an *included* Jetty, so it's easier
to get started.  That's all. :)

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Apr 25, 2013 at 3:30 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Otis;

 You are right. start.jar starts up an Jetty and there is a war file under
 example directory and deploys start.jar to itself, is that true?

 2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com

 Suggestion :
 Don't call this embedded Jetty to avoid confusion with the actual embedded
 jetty.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Apr 23, 2013 4:56 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  Thanks for the answers. I will go with embedded Jetty for my SolrCloud.
 If
  I face with something important I would want to share my experiences with
  you.
 
  2013/4/23 Shawn Heisey s...@elyograg.org
 
   On 4/23/2013 2:25 PM, Furkan KAMACI wrote:
  
   Is there any documentation that explains using Jetty as embedded or
  not? I
   use Solr deployed at Tomcat but after you message I will consider
 about
   Jetty. If we think about other issues i.e. when I want to update my
 Solr
   jars/wars etc.(this is just an foo example) does any pros and cons
  Tomcat
   or Jetty has?
  
  
   The Jetty in the example is only 'embedded' in the sense that you don't
   have to install it separately.  It is not special -- the Jetty
 components
   are not changed at all, a subset of them is just included in the Solr
   download with a tuned configuration file.
  
   If you go to www.eclipse.org/jetty and download the latest stable-8
   version, you'll see some familiar things - start.jar, an etc
 directory, a
   lib directory, and a contexts directory.  They have more in them than
 the
   example does -- extra functionality Solr doesn't need.  If you want to
   start the downloaded version, you can use 'java -jar start.jar' just
 like
   you do with Solr.
  
   Thanks,
   Shawn
  
  
 



Re: what is the maximum XML file size to import?

2013-04-25 Thread Otis Gospodnetic
Hi,

Even if you could import giant files, I'd avoid it because it feels
like just asking for trouble.  Chunk the file into smaller pieces.
You can index such smaller pieces in parallel, too, and end up with
faster indexing as the result.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Apr 25, 2013 at 12:10 AM, Sharmila Thapa shar...@gmail.com wrote:
 Yes,
 I have again tried to post the XML of size 2.02GB, now it throws a different
 error message
 http://lucene.472066.n3.nabble.com/file/n4058825/1.png

 While searching the cause for this error message it is found that
 java's setFixedLengthStreamingMode method throws this error.
 Reference:Documentation:http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html.
 So we have to limit size of the file to 16 bits i.e. 2GB.
 I have also tried by setting unlimited java heap size, but does not work.

 So is there anything that can be done to support upto 6GB xml file size. If
 possible I would like to try to use java -Durl to import the XML data. If
 this does not work, then I will try for other alternatives as you have
 suggested DIH.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/what-is-the-maximum-XML-file-size-to-import-tp4058263p4058825.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr For a Real Search Engine

2013-04-25 Thread Furkan KAMACI
Thanks, Otis I got it.

2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 No, start.jar is not deployed.  That *is* Jetty.
 This is what the real Embedded Jetty is about:
 http://wiki.eclipse.org/Jetty/Tutorial/Embedding_Jetty

 What we have here is Solr is just an *included* Jetty, so it's easier
 to get started.  That's all. :)

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Apr 25, 2013 at 3:30 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis;
 
  You are right. start.jar starts up an Jetty and there is a war file under
  example directory and deploys start.jar to itself, is that true?
 
  2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Suggestion :
  Don't call this embedded Jetty to avoid confusion with the actual
 embedded
  jetty.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Apr 23, 2013 4:56 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
   Thanks for the answers. I will go with embedded Jetty for my
 SolrCloud.
  If
   I face with something important I would want to share my experiences
 with
   you.
  
   2013/4/23 Shawn Heisey s...@elyograg.org
  
On 4/23/2013 2:25 PM, Furkan KAMACI wrote:
   
Is there any documentation that explains using Jetty as embedded or
   not? I
use Solr deployed at Tomcat but after you message I will consider
  about
Jetty. If we think about other issues i.e. when I want to update my
  Solr
jars/wars etc.(this is just an foo example) does any pros and cons
   Tomcat
or Jetty has?
   
   
The Jetty in the example is only 'embedded' in the sense that you
 don't
have to install it separately.  It is not special -- the Jetty
  components
are not changed at all, a subset of them is just included in the
 Solr
download with a tuned configuration file.
   
If you go to www.eclipse.org/jetty and download the latest stable-8
version, you'll see some familiar things - start.jar, an etc
  directory, a
lib directory, and a contexts directory.  They have more in them
 than
  the
example does -- extra functionality Solr doesn't need.  If you want
 to
start the downloaded version, you can use 'java -jar start.jar' just
  like
you do with Solr.
   
Thanks,
Shawn
   
   
  
 



Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Jack Krupansky

Well then just do an exact match ONLY!

It sounds like you haven't worked out the inconsistencies in your 
requirements.


To be clear: We're not offering you solutions - that's your job. We're 
only pointing out tools that you can use. It is up to you to utilize the 
tools wisely to implement your solution.


I suspect that you simply haven't experimented enough with various boosts to 
assure that the unstemmed result is consistently higher.


Maybe you need a custom stemmer or stemmer overide so that passengers does 
get stemmed to passenger, but cats does not (but dogs does.) That can 
be a choice that you can make, but I would urge caution. Still, it is a 
decision that you can make - it's not a matter of Solr forcing or preventing 
you. I still think boosting of an unstemmed field should be sufficient.


But until you clarify the inconsistencies in your requirements, we won't be 
able to make much progress.


-- Jack Krupansky

-Original Message- 
From: vsl

Sent: Thursday, April 25, 2013 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching in Solr 3.6.1

Thanks for your reply but this solution does not fullfil my requirment
because other documents (not exact matched) will be returned as well.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html
Sent from the Solr - User mailing list archive at Nabble.com. 



How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
I have a Zookeepeer ensemble with three machines. I have started a cluster
with one shard. However I decided to change my shard number. I want to
clean Zookeeper data but whatever I do I always get one shard and rest of
added Solr nodes are as replica.

What should I do?


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry
Agree with Jack.

The current field type text_general is designed to match the query tokens
instead of exact matches - so it's not able to fulfill your requirements.

Can you use flat file
http://wiki.apache.org/solr/FileBasedSpellCheckeras spell check
dictionary instead and that way you can search on exact
matched field while generating spell check suggestions from the file
instead of from index?

-S


On 25 April 2013 16:25, Jack Krupansky j...@basetechnology.com wrote:

 Well then just do an exact match ONLY!

 It sounds like you haven't worked out the inconsistencies in your
 requirements.

 To be clear: We're not offering you solutions - that's your job. We're
 only pointing out tools that you can use. It is up to you to utilize the
 tools wisely to implement your solution.

 I suspect that you simply haven't experimented enough with various boosts
 to assure that the unstemmed result is consistently higher.

 Maybe you need a custom stemmer or stemmer overide so that passengers
 does get stemmed to passenger, but cats does not (but dogs does.)
 That can be a choice that you can make, but I would urge caution. Still, it
 is a decision that you can make - it's not a matter of Solr forcing or
 preventing you. I still think boosting of an unstemmed field should be
 sufficient.

 But until you clarify the inconsistencies in your requirements, we won't
 be able to make much progress.


 -- Jack Krupansky

 -Original Message- From: vsl
 Sent: Thursday, April 25, 2013 10:45 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Exact matching in Solr 3.6.1

 Thanks for your reply but this solution does not fullfil my requirment
 because other documents (not exact matched) will be returned as well.



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Exact-matching-in-**Solr-3-6-1-tp4058865p4058929.**htmlhttp://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Michael Della Bitta
This is what I have done.

1. Turn off all your Solr nodes.

2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On
my machine, it's in /usr/lib/zookeeper/bin.

3. If you've chrooted Solr, just rmr /solr_chroot_dir.  Otherwise, use
rmr to delete these files and folders:

clusterstate.json
aliases.json
live_nodes
overseer
overseer_elect
collections

If you use a chroot jail, make it again with create /solr_chroot_dir []

4. Use Solr's zkCli to upload your configs again.

5. Start all your Solr nodes.

6. Create your collections again.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 I have a Zookeepeer ensemble with three machines. I have started a cluster
 with one shard. However I decided to change my shard number. I want to
 clean Zookeeper data but whatever I do I always get one shard and rest of
 added Solr nodes are as replica.

 What should I do?


RE: Using another way instead of DIH

2013-04-25 Thread Dyer, James
If you post your data-config.xml here, someone might be able to find something 
you could change to speed things up.  If the issue is parallelization, then you 
could possibly partition your data somehow and then run multiple DIH request 
handlers at the same time.  This might be easier than writing your own update 
program.

If you still think you need to write something custom, see this:  
http://wiki.apache.org/solr/Solrj#Adding_Data_to_Solr

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: xiaoqi [mailto:belivexia...@gmail.com] 
Sent: Thursday, April 25, 2013 10:01 AM
To: solr-user@lucene.apache.org
Subject: Using another way instead of DIH

hi,all

i using DIH to build index is slow , when it fetch 2 million rows , it will
spend 20 minutes , very slow. 

i am not very familar with  solr , try to   using lucene direct building
index file from db then move to solr folder.

i am not sure ,that is right way. or any other good way? 

thanks a lot .


 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Deletes and inserts

2013-04-25 Thread Jon Strayer
Thanks Michael,
  How do you handle configurations in zookeeper?  I tried reusing the same
configuration but I'm getting an error message that may mean that doesn't
work.  Or maybe I'm doing something wrong.


On Wed, Apr 24, 2013 at 12:50 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 We're using aliases to control visibility of collections we rebuild
 from scratch nightly. It works pretty well. If you run CREATEALIAS
 again, it'll switch to a new one, not augment the old one.

 If for some reason, you want to bridge more than one collection, you
 can add more than one collection to the alias at creation time, but
 then it becomes read-only.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Wed, Apr 24, 2013 at 12:26 PM, Jon Strayer j...@strayer.org wrote:
  We are using a Solr collection to serve auto complete suggestions.  We'd
  like for the update to be without any noticeable delay for the users.
 
  I've been looking at adding new cores, loading them with the new data and
  then swapping them with the current ones, but but I don't see how that
  would work in a cloud installation.  It seems that when I create a new
 core
  it is part of the collection and the old data will start replicating to
 it.
   Is that correct?
 
  I've also looked at standing up a new collection and then adding an alias
  for it, but that's not well documented.  If the alias already exists and
 I
  add to to another collection is it removed from the first collection?
 
  I'm open to any suggestions.
 
  --
  To *know* is one thing, and to know for certain *that* we know is
 another.
  --William James




-- 
To *know* is one thing, and to know for certain *that* we know is another.
--William James


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Otis Gospodnetic
Nice.  Sounds like FAQ/Wiki material, Mike! :)

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Apr 25, 2013 at 11:33 AM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 This is what I have done.

 1. Turn off all your Solr nodes.

 2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On
 my machine, it's in /usr/lib/zookeeper/bin.

 3. If you've chrooted Solr, just rmr /solr_chroot_dir.  Otherwise, use
 rmr to delete these files and folders:

 clusterstate.json
 aliases.json
 live_nodes
 overseer
 overseer_elect
 collections

 If you use a chroot jail, make it again with create /solr_chroot_dir []

 4. Use Solr's zkCli to upload your configs again.

 5. Start all your Solr nodes.

 6. Create your collections again.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI furkankam...@gmail.com 
 wrote:
 I have a Zookeepeer ensemble with three machines. I have started a cluster
 with one shard. However I decided to change my shard number. I want to
 clean Zookeeper data but whatever I do I always get one shard and rest of
 added Solr nodes are as replica.

 What should I do?


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
You said: Otherwise, use rmr to delete these files and folders.

Can you give an example?


2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com

 Nice.  Sounds like FAQ/Wiki material, Mike! :)

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Apr 25, 2013 at 11:33 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
  This is what I have done.
 
  1. Turn off all your Solr nodes.
 
  2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On
  my machine, it's in /usr/lib/zookeeper/bin.
 
  3. If you've chrooted Solr, just rmr /solr_chroot_dir.  Otherwise, use
  rmr to delete these files and folders:
 
  clusterstate.json
  aliases.json
  live_nodes
  overseer
  overseer_elect
  collections
 
  If you use a chroot jail, make it again with create /solr_chroot_dir []
 
  4. Use Solr's zkCli to upload your configs again.
 
  5. Start all your Solr nodes.
 
  6. Create your collections again.
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  I have a Zookeepeer ensemble with three machines. I have started a
 cluster
  with one shard. However I decided to change my shard number. I want to
  clean Zookeeper data but whatever I do I always get one shard and rest
 of
  added Solr nodes are as replica.
 
  What should I do?



Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Mark Miller
What are you doing to clean zk?

You should be able to simply use the ZkCli clear cmd:

http://wiki.apache.org/solr/SolrCloud#Command_Line_Util

Just make sure you stop your Solr instances before clearing it. Clearing out zk 
from under a running Solr instance is not a good thing to do.

This should be as simple as, stop your Solr instances, use the clean command on 
/ or /solr (whatever the root is in zk for you Solr stuff), start your Solr 
instances, create the collection again.

- Mark

On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 I have a Zookeepeer ensemble with three machines. I have started a cluster
 with one shard. However I decided to change my shard number. I want to
 clean Zookeeper data but whatever I do I always get one shard and rest of
 added Solr nodes are as replica.
 
 What should I do?



Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Mark Miller
Of course deleting the collection and then recreating it should also work - if 
it doesn't, there is a bug to address.

- Mark

On Apr 25, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:

 What are you doing to clean zk?
 
 You should be able to simply use the ZkCli clear cmd:
 
 http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
 
 Just make sure you stop your Solr instances before clearing it. Clearing out 
 zk from under a running Solr instance is not a good thing to do.
 
 This should be as simple as, stop your Solr instances, use the clean command 
 on / or /solr (whatever the root is in zk for you Solr stuff), start your 
 Solr instances, create the collection again.
 
 - Mark
 
 On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 
 I have a Zookeepeer ensemble with three machines. I have started a cluster
 with one shard. However I decided to change my shard number. I want to
 clean Zookeeper data but whatever I do I always get one shard and rest of
 added Solr nodes are as replica.
 
 What should I do?
 



Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
Hi;
If you can help it would be nice:

I have erased the data. I use that commands:

Firstly I do that:

java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
-Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2
-Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf
-Dcollection.configName=myconf -jar start.jar

and do that:

java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
-Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar

However when I look at the graph at Admin GUI there is only one shard but
two replicas? What is the problem why it is not two shards?


2013/4/25 Mark Miller markrmil...@gmail.com

 Of course deleting the collection and then recreating it should also work
 - if it doesn't, there is a bug to address.

 - Mark

 On Apr 25, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:

  What are you doing to clean zk?
 
  You should be able to simply use the ZkCli clear cmd:
 
  http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
 
  Just make sure you stop your Solr instances before clearing it. Clearing
 out zk from under a running Solr instance is not a good thing to do.
 
  This should be as simple as, stop your Solr instances, use the clean
 command on / or /solr (whatever the root is in zk for you Solr stuff),
 start your Solr instances, create the collection again.
 
  - Mark
 
  On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
  I have a Zookeepeer ensemble with three machines. I have started a
 cluster
  with one shard. However I decided to change my shard number. I want to
  clean Zookeeper data but whatever I do I always get one shard and rest
 of
  added Solr nodes are as replica.
 
  What should I do?
 




Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Michael Della Bitta
Today I learned there's a clear command in the command line util. :)

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Apr 25, 2013 at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:
 What are you doing to clean zk?

 You should be able to simply use the ZkCli clear cmd:

 http://wiki.apache.org/solr/SolrCloud#Command_Line_Util

 Just make sure you stop your Solr instances before clearing it. Clearing out 
 zk from under a running Solr instance is not a good thing to do.

 This should be as simple as, stop your Solr instances, use the clean command 
 on / or /solr (whatever the root is in zk for you Solr stuff), start your 
 Solr instances, create the collection again.

 - Mark

 On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 I have a Zookeepeer ensemble with three machines. I have started a cluster
 with one shard. However I decided to change my shard number. I want to
 clean Zookeeper data but whatever I do I always get one shard and rest of
 added Solr nodes are as replica.

 What should I do?



Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
Ooppss, I wrote numshards, I think it should be numShards

2013/4/25 Michael Della Bitta michael.della.bi...@appinions.com

 Today I learned there's a clear command in the command line util. :)

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Thu, Apr 25, 2013 at 12:00 PM, Mark Miller markrmil...@gmail.com
 wrote:
  What are you doing to clean zk?
 
  You should be able to simply use the ZkCli clear cmd:
 
  http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
 
  Just make sure you stop your Solr instances before clearing it. Clearing
 out zk from under a running Solr instance is not a good thing to do.
 
  This should be as simple as, stop your Solr instances, use the clean
 command on / or /solr (whatever the root is in zk for you Solr stuff),
 start your Solr instances, create the collection again.
 
  - Mark
 
  On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
  I have a Zookeepeer ensemble with three machines. I have started a
 cluster
  with one shard. However I decided to change my shard number. I want to
  clean Zookeeper data but whatever I do I always get one shard and rest
 of
  added Solr nodes are as replica.
 
  What should I do?
 



Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Mark Miller
I think it's numShards, not numshards.

- Mark

On Apr 25, 2013, at 12:07 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;
 If you can help it would be nice:
 
 I have erased the data. I use that commands:
 
 Firstly I do that:
 
 java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
 -Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2
 -Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf
 -Dcollection.configName=myconf -jar start.jar
 
 and do that:
 
 java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
 -Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar
 
 However when I look at the graph at Admin GUI there is only one shard but
 two replicas? What is the problem why it is not two shards?
 
 
 2013/4/25 Mark Miller markrmil...@gmail.com
 
 Of course deleting the collection and then recreating it should also work
 - if it doesn't, there is a bug to address.
 
 - Mark
 
 On Apr 25, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:
 
 What are you doing to clean zk?
 
 You should be able to simply use the ZkCli clear cmd:
 
 http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
 
 Just make sure you stop your Solr instances before clearing it. Clearing
 out zk from under a running Solr instance is not a good thing to do.
 
 This should be as simple as, stop your Solr instances, use the clean
 command on / or /solr (whatever the root is in zk for you Solr stuff),
 start your Solr instances, create the collection again.
 
 - Mark
 
 On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
 I have a Zookeepeer ensemble with three machines. I have started a
 cluster
 with one shard. However I decided to change my shard number. I want to
 clean Zookeeper data but whatever I do I always get one shard and rest
 of
 added Solr nodes are as replica.
 
 What should I do?
 
 
 



Re: How do set compression for compression on stored fields in SOLR 4.2.1

2013-04-25 Thread Otis Gospodnetic
Hi,

Is the question how/where to set that?
This is what I found in my repo checkout:

$ ffxg COMPRE
./core/src/test-files/solr/collection1/conf/solrconfig-slave.xml:
  str name=compressionCOMPRESSION/str

Hm, but that's about replication compression.  Maybe we don't have any
examples of this in configs?

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Apr 24, 2013 at 3:06 PM, William Bell billnb...@gmail.com wrote:
 https://issues.apache.org/jira/browse/LUCENE-4226
 It mentions that we can set compression mode:
 FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION.


 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
Ok, it works

2013/4/25 Mark Miller markrmil...@gmail.com

 I think it's numShards, not numshards.

 - Mark

 On Apr 25, 2013, at 12:07 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  Hi;
  If you can help it would be nice:
 
  I have erased the data. I use that commands:
 
  Firstly I do that:
 
  java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
  -Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2
  -Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf
  -Dcollection.configName=myconf -jar start.jar
 
  and do that:
 
  java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
  -Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar
 
  However when I look at the graph at Admin GUI there is only one shard but
  two replicas? What is the problem why it is not two shards?
 
 
  2013/4/25 Mark Miller markrmil...@gmail.com
 
  Of course deleting the collection and then recreating it should also
 work
  - if it doesn't, there is a bug to address.
 
  - Mark
 
  On Apr 25, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  What are you doing to clean zk?
 
  You should be able to simply use the ZkCli clear cmd:
 
  http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
 
  Just make sure you stop your Solr instances before clearing it.
 Clearing
  out zk from under a running Solr instance is not a good thing to do.
 
  This should be as simple as, stop your Solr instances, use the clean
  command on / or /solr (whatever the root is in zk for you Solr stuff),
  start your Solr instances, create the collection again.
 
  - Mark
 
  On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com
  wrote:
 
  I have a Zookeepeer ensemble with three machines. I have started a
  cluster
  with one shard. However I decided to change my shard number. I want to
  clean Zookeeper data but whatever I do I always get one shard and rest
  of
  added Solr nodes are as replica.
 
  What should I do?
 
 
 




Re: Deletes and inserts

2013-04-25 Thread Michael Della Bitta
We've successfully reused the same config in Zookeeper across multiple
collections and using aliases.

Could you describe your problem? What does the error say?

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Apr 25, 2013 at 11:44 AM, Jon Strayer j...@strayer.org wrote:
 Thanks Michael,
   How do you handle configurations in zookeeper?  I tried reusing the same
 configuration but I'm getting an error message that may mean that doesn't
 work.  Or maybe I'm doing something wrong.


 On Wed, Apr 24, 2013 at 12:50 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

 We're using aliases to control visibility of collections we rebuild
 from scratch nightly. It works pretty well. If you run CREATEALIAS
 again, it'll switch to a new one, not augment the old one.

 If for some reason, you want to bridge more than one collection, you
 can add more than one collection to the alias at creation time, but
 then it becomes read-only.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Wed, Apr 24, 2013 at 12:26 PM, Jon Strayer j...@strayer.org wrote:
  We are using a Solr collection to serve auto complete suggestions.  We'd
  like for the update to be without any noticeable delay for the users.
 
  I've been looking at adding new cores, loading them with the new data and
  then swapping them with the current ones, but but I don't see how that
  would work in a cloud installation.  It seems that when I create a new
 core
  it is part of the collection and the old data will start replicating to
 it.
   Is that correct?
 
  I've also looked at standing up a new collection and then adding an alias
  for it, but that's not well documented.  If the alias already exists and
 I
  add to to another collection is it removed from the first collection?
 
  I'm open to any suggestions.
 
  --
  To *know* is one thing, and to know for certain *that* we know is
 another.
  --William James




 --
 To *know* is one thing, and to know for certain *that* we know is another.
 --William James


Re: how to get display Jessionid with solr results

2013-04-25 Thread Michael Della Bitta
You should look into the documentation of your load balancer to see
how you can enable sticky sessions. If you've already done that and
the load balancer requires jsessionid rather than using it's own
sticky session method, it looks like documentation for using
jsessionid with Jetty is here:
http://wiki.eclipse.org/Jetty/Howto/SessionIds

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 24, 2013 at 6:36 PM, gpssolr2020 psgoms...@gmail.com wrote:
 Hi,

 We are using jetty as a container for solr 3.6. We have two slave servers to
 serve queries for the user request and queries distributed to any one slave
 through load balancer.

 When one user send a first search request say its going to slave1 and when
 that user queries again we want to send the query to the same server with
 the help of Jsessionid.

 how to achieve this? How to get that Jsessionid with solr search results?
 Please provide your suggestions.

 Thanks.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-get-display-Jessionid-with-solr-results-tp4058751.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Need to log query request before it is processed

2013-04-25 Thread Timothy Potter
I would like to log query requests before they are processed.
Currently, it seems they are only logged after being processed. I've
tried enabling a finer logging level but that didn't seem to help.
I've enabled request logging in Jetty but most queries come in as
POSTs from SolrJ

I was thinking of adding a query request logger as a first-component
but wanted to see what others have done for this?

Thanks.
Tim


Re: Problem with solr deployment on weblogic 10.3

2013-04-25 Thread Shawn Heisey
On 4/25/2013 12:04 AM, Shawn Heisey wrote:
 It looks like the solution is adding some config to the weblogic.xml
 file in the solr.war so that weblogic prefers application classes.  I
 filed SOLR-4762.  I do not know if this change might have unintended
 consequences.
 
 http://ananthkannan.blogspot.com/2009/08/beware-of-stringutilscontainsignorecase.html
 
 https://issues.apache.org/jira/browse/SOLR-4762

Radhakrishna: Do you know how to extract solr.war, change the
WEB-INF/weblogic.xml file, and repack it?   I have created a patch for
the Solr source code, but I don't have weblogic, so I can't test it to
make sure it works.  I am running tests to make sure that the change
doesn't break anything else.

Alternatively, you could download the source code, apply the patch I
uploaded to SOLR-4762, build Solr, and try the changed version.  You
never said what version of Solr you are using.  The important part of
the patch should apply correctly to the source of most versions.  The
CHANGES.txt part of the patch will fail on anything older than the 4.x
dev branch (4.4), but that's not an important part of the patch.

Thanks,
Shawn



Re: What is the difference between a Join Query and Embedded Entities in Solr DIH?

2013-04-25 Thread Shawn Heisey
On 4/25/2013 8:17 AM, Gustav wrote:
 Are these two methods functionally different? Is there a performance
 difference?
 
 Another though would be that, if using join tables in MySQL, using the SQL
 query method with multiple joins could cause multiple documents to be
 indexed instead of one.

They may be equivalent in terms of results, but they work differently
and probably will NOT have the same performance.

When using nested entities in DIH, the main entity results in one SQL
query, but the inner entities will result in a separate SQL query for
every single item returned by the main query.  If you have exactly 1
million rows in your main table and you're using a nested config with
two entities, you will be executing 101 queries.  DIH will be
spending a fair amount of time doing nothing but waiting for the latency
on a million individual queries via JDBC.  It probably also results in
extra work for the database server.

With a server-side join, you're down to one query via JDBC, and the
database server is doing the work of combining your tables, normally
something it can do very efficiently.

Thanks,
Shawn



Re: Question on storage and index/data management in solr

2013-04-25 Thread Shawn Heisey
On 4/25/2013 8:39 AM, Vinay Rai wrote:
 1. Keep each of last 24 hours segments separate.
 2. Segments generated between last 48 to 24 hours to be merged into one. 
 Similarly, for segments created between 72 to 48 hours and so on for last 1 
 week.
 3. Similarly, merge previous 4 week's data into one segment each week.
 4. Merge all previous months data into one segment each month.
 
 I am not sure if there is a configuration possible in solr application. If 
 not, are there APIs which will allow me to do this?

To accomplish this exact scenario, you would probably have to write a
custom merge policy class for Lucene.  If you do so, I hope you'll
strongly consider donating it to the Lucene/Solr project.

Another approach: Use distributed search and put the divisions you are
looking at into separate indexes (shards) in their own cores.  You can
then manually do whatever index merging your situation requires.
Constructing the shards parameter for your queries will take some work.

Here's a blog post about this method and a video of the Lucene
Revolution talk mentioned in the blog post:

http://www.loggly.com/blog/2010/08/our-solr-system/
http://loggly.com/videos/lucene-revolution-2010/

I had the honor of being there for that talk in Boston.  They've done
some amazing things with Solr.

 Also, I want to understand how solr stores data or does it have a dependency 
 on the way data is stored. Since the volumes are high, it would be great if 
 the data is compressed and stored (while still searchable). If it is 
 possible, I would like to know what kind of compression does solr do?

Solr 4.1 uses compression for stored fields.  Solr 4.2 also uses
compression for term vectors.  From a performance perspective,
compression is probably not viable at this time for the indexed data,
but if that changes in the future, I'm sure that it will be added.

Here is documentation on the file format used by Solr 4.2:

http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description

Thanks,
Shawn



Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Shawn Heisey
On 4/25/2013 6:30 AM, Dmitry Kan wrote:
 We are very much interested in 3.4.
 
 On Thu, Apr 25, 2013 at 12:55 PM, Alan Woodward a...@flax.co.uk wrote:
 This is on top of trunk at the moment, but would be back ported to 4.4 if
 there was interest.

This will be bad news, I'm sorry:

All remaining work on 3.x versions happens in the 3.6 branch. This
branch is in maintenance mode.  It will only get fixes for serious bugs
with no workaround.  Improvements and new features won't be considered
at all.

You're welcome to try backporting patches from newer issues.  Due to the
major differences in the 3x and 4x codebases, the best case scenario is
that you'll be facing a very manual task.  Some changes can't be
backported because they rely on other features only found in 4.x code.

Thanks,
Shawn



Atomic update issue with 4.0 and 4.2.1

2013-04-25 Thread David Fennessey
Hi everyone ,

We have hit this strange bug using the atomic update functionality of both SOLR 
4.0 and SOLR 4.2.1.

We're currently posting a JSON formatted file to the core's updater using a 
simple curl method however we've run a very bizarre error where periodically it 
will fail and return a 400 error message. If we were to send the exact same 
request and file 5 minutes later, sometimes it will be accepted and return a 
200 and other times it will continue to throw 400's.  This tends to happen when 
the SOLR is receiving a lot of updates and restarting tomcat seems to clear up 
the issue, however I feel that there is probably something important that I am 
missing.

The error message that it throws is quite strange and I don't really feel that 
it means very much because we can fire the exact same message 5 minutes later 
and it will happily fill that field. I am positive that I am only sending the 
value 965.00 in this case.

2013-04-25 00:20:39,373 [ERROR] org.apache.solr.core.SolrCore 
org.apache.solr.common.SolrException: ERROR: [doc=1764656] Error adding field 
'maxPrice'='java.math.BigDecimal:965.' msg=For input string: 
java.math.BigDecimal:965.' msg=For input string: 
java.math.BigDecimal:965.
---at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:300)
---at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
---at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:199)
---at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
---at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
---at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
---at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
---at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
---at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
---at 
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:387)
---at 
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:112)
---at 
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:96)
---at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:60)
---at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
---at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
---at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
---at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
---at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
---at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
---at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
---at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
---at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
---at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
---at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
---at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
---at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
---at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
---at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
---at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
---at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
---at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
---at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
---at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
---at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
---at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NumberFormatException: For input string: 
java.math.BigDecimal:965.
---at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
---at java.lang.Float.parseFloat(Float.java:452)
---at org.apache.solr.schema.TrieField.createField(TrieField.java:598)
---at org.apache.solr.schema.TrieField.createFields(TrieField.java:655)
---at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:180)
---at 

Cloudspace and Solr Support Page

2013-04-25 Thread Nina Talley
Hi there,

 We offer Solr support and were wondering how we would go about being added
to the Solr Support page http://wiki.apache.org/solr/Support? Thanks so
much for your time!

-- 
[image: Cloudspace.com] http://www.cloudspace.com/Nina TalleyAccount
ManagerOffice: 877.823.8808

11551 University Blvd, Suite 2

Orlando, FL 32817


Massive Positions Files

2013-04-25 Thread Mike
Hi All,

I'm indexing a pretty large collection of documents (about 500K relatively
long documents taking up 1TB space, mostly in MS Office formats), and am
confused about the file sizes in the index.  I've gotten through about 180K
documents, and the *.pos files add up to 325GB, while the all of the rest
combined are using less than 5GB--including some large stored fields and
term vectors.  It makes sense to me that the compression on stored fields
helps to keep that part down on large text fields, and that term vectors
wouldn't be too big since they don't need position information, but the
magnitude of the difference is alarming.  Is that to be expected?  Is there
any way to reduce the size of the positions index if phrase searching is a
requirement?

I am using Solr 4.2.1.  These documents have some a number of small
metadata elements, along with the big content field.  Like the default
schema, I'm storing but not indexing the content field, and a lot of the
fields get put into a catchall that is indexed and uses term vectors, but
is not stored.

Thanks,
Mike


Re: Cloudspace and Solr Support Page

2013-04-25 Thread Jan Høydahl
Hi,

Just give your WIKI user name and we'll give you access to edit that page to 
add yourself.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

25. apr. 2013 kl. 21:39 skrev Nina Talley n...@cloudspace.com:

 Hi there,
 
 We offer Solr support and were wondering how we would go about being added
 to the Solr Support page http://wiki.apache.org/solr/Support? Thanks so
 much for your time!
 
 -- 
 [image: Cloudspace.com] http://www.cloudspace.com/Nina TalleyAccount
 ManagerOffice: 877.823.8808
 
 11551 University Blvd, Suite 2
 
 Orlando, FL 32817



Re: Reordered DBQ.

2013-04-25 Thread Marcin Rzewucki
OK. Thanks for explanation.


On 23 April 2013 23:16, Yonik Seeley yo...@lucidworks.com wrote:

 On Tue, Apr 23, 2013 at 3:51 PM, Marcin Rzewucki mrzewu...@gmail.com
 wrote:
  Recently I noticed a lot of Reordered DBQs detected messages in logs.
 As
  far as I checked in logs it could be related with deleting documents, but
  not sure. Do you know what is the reason of those messages ?

 For high throughput indexing, we version updates on the leader and
 forward onto other replicas w/o strict serialization.
 If on a leader, an add happened before a DBQ, then on a replica the
 DBQ is serviced before the add, Solr detects this reordering and fixes
 it.
 It's not an error or an indication that anything is wrong (hence the
 INFO level log message).

 -Yonik
 http://lucidworks.com



How To Make Index Backup at SolrCloud?

2013-04-25 Thread Furkan KAMACI
I use SolrCloud. Let's assume that I want to move all indexes from one
place to another. There maybe two reasons for that:

First one is that: I will close all my system and I will use new machines
with previous indexes (if it is a must they may have same network topology)
at anywhere else after some time later.
Second one is that: I know that SolrCloud handles failures but I will back
up my indexes for a disaster event.

How can I back up my indexes? I know that I can start up new nodes and I
can close the old ones so I can move my indexes to other machines. However
how can I do such kind of backup (should I just copy data folder of Solr
nodes and put them to new Solr nodes after I change Zookeeper
configuration)?

What folks do?


Re: filter before facet

2013-04-25 Thread Daniel Tyreus
On Thu, Apr 25, 2013 at 12:35 AM, Toke Eskildsen 
t...@statsbiblioteket.dkwrote:



  This leads me to believe that the FQ is being applied AFTER the facets
 are
  calculated on the whole data set. For my use case it would make a ton of
  sense to apply the FQ first and then facet. Is it possible to specify
 this
  behavior or do I need to get into the code and get my hands dirty?


 As for creating a new faceting implementation that avoids the startup
 penalty by using only the found documents, then it is technically quite
 simple: Use stored fields, iterate the hits and request the values.
 Unfortunately this scales poorly with the number of hits, so unless you
 can guarantee that you will always have small result sets, this is
 probably not a viable option.


Thank you Toke for your detailed reply. I have perhaps an unusual use case
where we may have hundreds of thousands of users each with a few thousand
documents. On some queries I can guarantee the result size will be small
compared to the entire corpus since I'm filtering on one user's documents.
I may give this alternative faceting implementation a try.

Best regards,
Daniel


Re: Massive Positions Files

2013-04-25 Thread Jack Krupansky
These are the postings for all terms - the lists of positions for every 
occurrence of every term for all documents. Sounds to me like it could be 
huge.


Did you try a back of the envelope calculation?

3.25 GB divided by 180K = 18 K per doc (call it 2K).

How many words in a document? You say they are long.

Even if there were only 5000 to 1 postings per long document, that 
would work out to 2 to 4 bytes or so per posting. I have no idea how big an 
average term posting might be, but these numbers do not seem at all 
unreasonable.


Now, let's see what kind of precise answer the Lucene guys give you!

-- Jack Krupansky

-Original Message- 
From: Mike

Sent: Thursday, April 25, 2013 4:00 PM
To: solr-user@lucene.apache.org
Subject: Massive Positions Files

Hi All,

I'm indexing a pretty large collection of documents (about 500K relatively
long documents taking up 1TB space, mostly in MS Office formats), and am
confused about the file sizes in the index.  I've gotten through about 180K
documents, and the *.pos files add up to 325GB, while the all of the rest
combined are using less than 5GB--including some large stored fields and
term vectors.  It makes sense to me that the compression on stored fields
helps to keep that part down on large text fields, and that term vectors
wouldn't be too big since they don't need position information, but the
magnitude of the difference is alarming.  Is that to be expected?  Is there
any way to reduce the size of the positions index if phrase searching is a
requirement?

I am using Solr 4.2.1.  These documents have some a number of small
metadata elements, along with the big content field.  Like the default
schema, I'm storing but not indexing the content field, and a lot of the
fields get put into a catchall that is indexed and uses term vectors, but
is not stored.

Thanks,
Mike 



Re: Need to log query request before it is processed

2013-04-25 Thread Sudhakar Maddineni
HI Tim,
  Have you tried by enabling the logging levels on httpclient, which is
used by solrj classes internally?

Thx,Sudhakar.


On Thu, Apr 25, 2013 at 10:12 AM, Timothy Potter thelabd...@gmail.comwrote:

 I would like to log query requests before they are processed.
 Currently, it seems they are only logged after being processed. I've
 tried enabling a finer logging level but that didn't seem to help.
 I've enabled request logging in Jetty but most queries come in as
 POSTs from SolrJ

 I was thinking of adding a query request logger as a first-component
 but wanted to see what others have done for this?

 Thanks.
 Tim



Re: SolrJ Custom RowMapper

2013-04-25 Thread Sudhakar Maddineni
Hey Luis,
Check this example in the source:TestDocumentObjectBinder
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/solrj/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java

Thx,Sudhakar.



On Thu, Apr 25, 2013 at 7:56 AM, Luis Lebolo luis.leb...@gmail.com wrote:

 Hi All,

 Does SolrJ have an option for a custom RowMapper or BeanPropertyRowMapper
 (I'm using Spring/JDBC terms).

 I know the QueryResponse has a getBeans method, but I would like to create
 my own mapping and plug it in.

 Any pointers?

 Thanks,
 Luis



Re: How To Make Index Backup at SolrCloud?

2013-04-25 Thread Otis Gospodnetic
You can use the index backup command that's part of index replication,
check the Wiki.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Apr 25, 2013 5:23 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 I use SolrCloud. Let's assume that I want to move all indexes from one
 place to another. There maybe two reasons for that:

 First one is that: I will close all my system and I will use new machines
 with previous indexes (if it is a must they may have same network topology)
 at anywhere else after some time later.
 Second one is that: I know that SolrCloud handles failures but I will back
 up my indexes for a disaster event.

 How can I back up my indexes? I know that I can start up new nodes and I
 can close the old ones so I can move my indexes to other machines. However
 how can I do such kind of backup (should I just copy data folder of Solr
 nodes and put them to new Solr nodes after I change Zookeeper
 configuration)?

 What folks do?



Re: Too many close, count -1

2013-04-25 Thread Erick Erickson
One outside possibility (and 4.3 should refuse to start if this is the
case). Is it possible that more than one of your cores has the same
name?

FWIW,
Erick

On Tue, Apr 23, 2013 at 5:30 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : Subject: Re: Too many close, count -1

 Thanks for the details, nothing jumps out at me, but we're now tracking
 this in SOLR-4753...

 https://issues.apache.org/jira/browse/SOLR-4753

 -Hoss


Re: Query specific replica

2013-04-25 Thread Erick Erickson
bq: I was wondering wether it is possible to query the same core every request,

Not that I know of. You can ping a single node by appending
distrib=false, but that
won't then look at multiple shards. If you don't have any shards, this
would work I think...

Best
Erick

On Tue, Apr 23, 2013 at 6:31 PM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
 Hello,
 Since i replicated my shards (i have 2 cores per shard now), I get a
 remarkable decrease in qTime. I assume it happens since my memory has to
 split between twice more cores than it used to.

 In my low qps rate use-case, I use replications as shard backup only (in
 case one of my servers goes down) and not for the ability of serving
 parallel requests. In this case i decrease because the two cores of the
 shard are active.

 I was wondering wether it is possible to query the same core every request,
 instead of load balancing between the different replicas? And only if the
 leader replica goes down the second replica would start serving requests.

 Cheers,
 Manu


Re: Luke misreporting index-time boosts?

2013-04-25 Thread Erick Erickson
I think you're kinda missing the idea of index time boosting. The
semantic of this (as I remember Chris Hostetter explaining) is
this document's content is more important than other document's
content.

By doing an index-time boost that's the same for all your documents,
you're effectively doing nothing to the relative ranks of the results.

Not quite sure what Luke is doing here, but using debugQuery=on
will give you the actual scores of the actual documents. And if you're
doing anything like wildcards or *:* queries, shortcuts are taken
that set the scores to 1.0.

If none of that helps, I'm out of my depth G..

Best
Erick

On Wed, Apr 24, 2013 at 6:01 AM, Timothy Hill timothy.d.h...@gmail.com wrote:
 Hello, all

 I have recently been attempting to apply index-time boosts to fields using
 the following syntax:

 add
 doc
 field name=important_field boost=5bleah bleah bleah/field
 field name=standard_field boost=2content here/field
 field name=trivial_fieldcontent here/field
 /doc
 doc
 field name=important_field boost=5content here/field
 field name=standard_field boost=2bleah bleah bleah/field
 field name=trivial_fieldcontent here/field
 /doc
 /add

 The intention is that matches on important_field should be more important
 to score than matches on trivial_field (so that a search across all fields
 for the term 'content' would return the second document above the first),
 while still being able to use the standard query parser.

 Looking at output from Luke, however, all fields are reported as having a
 boost of 1.0.

 The following possibilities occur to me.

 (1) The entire index-time-boosting approach is misconceived
 (2) Luke is misreporting, because index-time boosting alters more
 fundamental aspects of scoring (tf-idf calculations, I suppose), and the
 index-time boost is thus invisible to it
 (3) Some combination of (1) and (2)

 Can anyone help illuminate the situation for me? Documentation for these
 questions seems patchy.

 Thanks,

 Tim


Re: Facets with OR clause

2013-04-25 Thread Erick Erickson
If you're talking about _filter queries_, Kai's answer is good

But your question is confusing. You
talk about facet queries, but then use fq, which is filter
query and has nothing to do with facets at all unless
you're talking about turning facet information into filter
queries..

FWIW,
Erick

On Wed, Apr 24, 2013 at 6:43 AM, Kai Becker m...@kai-becker.com wrote:
 Try fq=(groups:group1 OR locations:location1)

 Am 24.04.2013 um 12:39 schrieb vsl:

 Hi,

 my request contains following term:

 The are 3 facets:
 groups, locations, categories.



 When I select some items then I see such syntax in my request.
 fq=groups:group1fq=locations:location1

 Is it possible to add OR clause between facets items in query?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Facets-with-OR-clause-tp4058553.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Using another way instead of DIH

2013-04-25 Thread xiaoqi
Thanks for help .

data-config.xml ? i can not find this file , u mean data-import.xml or
solrconfig.xml ? 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937p4059067.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud 4.2 - Distributed Requests failing with NPE

2013-04-25 Thread Chris Hostetter

: trace:java.lang.NullPointerException\r\n\tat
: 
org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\r\n\tat
: 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\r\n\tat

yea, definitely a bug.  

Raintung reported this recently, and made a patch available...

https://issues.apache.org/jira/browse/SOLR-4705


-Hoss


Re: How do set compression for compression on stored fields in SOLR 4.2.1

2013-04-25 Thread Chris Hostetter
: Subject: How do set compression for compression on stored fields in SOLR 4.2.1
: 
: https://issues.apache.org/jira/browse/LUCENE-4226
: It mentions that we can set compression mode:
: FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION.

The compression details are hardcoded into the various codecs.  If you 
wanted to customize this, you'd need to write your own codec subclass...

https://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/compressing/class-use/CompressionMode.html

See, for example, the implementations of Lucene41StoredFieldsFormat and 
Lucene42TermVectorsFormat...


public final class Lucene41StoredFieldsFormat extends 
CompressingStoredFieldsFormat {
  /** Sole constructor. */
  public Lucene41StoredFieldsFormat() {
super(Lucene41StoredFields, CompressionMode.FAST, 1  14);
  }
}

public final class Lucene42TermVectorsFormat extends 
CompressingTermVectorsFormat {
  /** Sole constructor. */
  public Lucene42TermVectorsFormat() {
super(Lucene41StoredFields, , CompressionMode.FAST, 1  12);
  }
}




-Hoss


Re: Question on storage and index/data management in solr

2013-04-25 Thread Vinay Rai
Thank you very much Shawn for a detailed response. Let me read all the 
documentation you pointed to and digest it.

Sure, if I do use using solr and need to make this change, I would love to also 
submit it to the Lucene/Solr project.

Regards,
Vinay



 From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org 
Sent: Thursday, April 25, 2013 11:32 PM
Subject: Re: Question on storage and index/data management in solr
 

On 4/25/2013 8:39 AM, Vinay Rai wrote:
 1. Keep each of last 24 hours segments separate.
 2. Segments generated between last 48 to 24 hours to be merged into one. 
 Similarly, for segments created between 72 to 48 hours and so on for last 1 
 week.
 3. Similarly, merge previous 4 week's data into one segment each week.
 4. Merge all previous months data into one segment each month.
 
 I am not sure if there is a configuration possible in solr application. If 
 not, are there APIs which will allow me to do this?

To accomplish this exact scenario, you would probably have to write a
custom merge policy class for Lucene.  If you do so, I hope you'll
strongly consider donating it to the Lucene/Solr project.

Another approach: Use distributed search and put the divisions you are
looking at into separate indexes (shards) in their own cores.  You can
then manually do whatever index merging your situation requires.
Constructing the shards parameter for your queries will take some work.

Here's a blog post about this method and a video of the Lucene
Revolution talk mentioned in the blog post:

http://www.loggly.com/blog/2010/08/our-solr-system/
http://loggly.com/videos/lucene-revolution-2010/

I had the honor of being there for that talk in Boston.  They've done
some amazing things with Solr.

 Also, I want to understand how solr stores data or does it have a dependency 
 on the way data is stored. Since the volumes are high, it would be great if 
 the data is compressed and stored (while still searchable). If it is 
 possible, I would like to know what kind of compression does solr do?

Solr 4.1 uses compression for stored fields.  Solr 4.2 also uses
compression for term vectors.  From a performance perspective,
compression is probably not viable at this time for the indexed data,
but if that changes in the future, I'm sure that it will be added.

Here is documentation on the file format used by Solr 4.2:

http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description

Thanks,
Shawn

Re: Facets with 5000 facet fields - Out of memory error during the query time

2013-04-25 Thread sivaprasad
I got more information with the responses.Now, It's time to re look into  the
number of facets to be configured.

Thanks,
Siva
http://smarttechies.wordpress.com/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-with-5000-facet-fields-Out-of-memory-error-during-the-query-time-tp4048450p4059079.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud 4.2 - Distributed Requests failing with NPE

2013-04-25 Thread Sudhakar Maddineni
Thank you Hoss for looking into it.

-Sudhakar.


On Thu, Apr 25, 2013 at 6:50 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : trace:java.lang.NullPointerException\r\n\tat
 :
 org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\r\n\tat
 :
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\r\n\tat

 yea, definitely a bug.

 Raintung reported this recently, and made a patch available...

 https://issues.apache.org/jira/browse/SOLR-4705


 -Hoss