Re: Solr Cloud Stress Test

2015-01-17 Thread Rafał Kuć
Hello!

You'll need a data set that you can index. This is quite simple - if
you don't know what data you want to index or you don't have test
data, just take Wikipedia dump and index it using Data Import Handler.
You can find the example on using Wikipedia dump with DIH on
http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia.
On the other hand you can also extract documents with a simple code
and use tool like JMeter to prepare indexing stress test to see how
far you can push your SolrCloud deplouyment when it comes to indexing.

You will also need queries, so that you can run queries performance
tests. You can build some example queries using common phrases found
in Wikipedia - however I'm not sure if there is something already
extracted, you will either have to extract them yourself or use common
English phrases. You take that list of phrases (it would be nice to
have a lot of them, so they don't repeat) and run them against Solr.
Again you can use a tool like JMeter, create a test plan and run a
tests with different number of threads until you find a point where
Solr just can't handle more. 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi,

I am a student, planning to learn and do a features and functionality
 test of solr-cloud as one of my project. I liked to do the stress and
 performance test of solr-cloud on my local machine.  (machine of 16gb ram,
 250 gb ssd and 2.2 GHz Intel Core i7).  Multiple features of cloud. What is
 the recommended way to get started with it?

 Thanks.
 David



Replicating external field files under Windows

2015-01-15 Thread Rafał Kuć
Hello!

I have a slight problem with the replication and maybe someone have
the same experience and know if there is a solution. I have a Windows
based Solr installation where I use external field type - two fields,
two external files containing data. The deployment is a standard
master - slave. The problem is the replication. As we know,
replication can copy the index files and the configuration files. On
Linux system I used to use relative paths to the conf/ dir and I was
able to copy the external fields as well. On Windows I can't, even
though the paths seems to be OK.

Anyone has similar experience with Windows, replication and external
field type files?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch



Re: Query about vacuum full

2014-07-28 Thread Rafał Kuć
Hello!

Please refer to PostgreSQL mailing list with this question. This
question is purely about that database and this mailing list is about
Solr.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi,

 I am seeing considerable decrease in speed of indexing of documents.

 I am using PostgreSQL.

 So is this a right time to do vacuum on PostgreSQL because i am using this
 since a week.


 Also, to invoke vacuum full do i just need to go to PostgreSQL command
 prompt and invoke VACUUM FULL command?


 Thanks,
 Ameya



Re: Java heap Space error

2014-07-22 Thread Rafał Kuć
Hello!

Yes, just edit your Jetty configuration file and add -Xmx and -Xms
parameters. For example, the file you may be looking at it
/etc/default/jetty.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 So can i come over this exception by increasing heap size somewhere?

 Thanks,
 Ameya


 On Tue, Jul 22, 2014 at 2:00 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/22/2014 11:37 AM, Ameya Aware wrote:
  i am running into java heap space issue. Please see below log.

 All we have here is an out of memory exception.  It is impossible to
 know *why* you are out of memory from the exception.  With enough
 investigation, we could determine the area of code where the error
 occurred, but that doesn't say anything at all about what allocated all
 the memory, it's simply the final allocation that pushed it over the edge.

 Here is some generic information about Solr and high heap usage:

 http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 Thanks,
 Shawn





Re: [ANNOUNCE] Apache Solr 4.8.0 released

2014-04-29 Thread Rafał Kuć
Hello!

You don't need the fields and types section anymore, you can just
include type or field definition anywhere in the schema.xml section.
You can find more in https://issues.apache.org/jira/browse/SOLR-5228

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 In which sense fields and types are now deprecated in schema.xml? Where can
 I found any pointer about this?

 On Mon, Apr 28, 2014 at 6:54 PM, Uwe Schindler uschind...@apache.orgwrote:

 28 April 2014, Apache Solr™ 4.8.0 available

 The Lucene PMC is pleased to announce the release of Apache Solr 4.8.0

 Solr is the popular, blazing fast, open source NoSQL search platform
 from the Apache Lucene project. Its major features include powerful
 full-text search, hit highlighting, faceted search, dynamic
 clustering, database integration, rich document (e.g., Word, PDF)
 handling, and geospatial search.  Solr is highly scalable, providing
 fault tolerant distributed search and indexing, and powers the search
 and navigation features of many of the world's largest internet sites.

 Solr 4.8.0 is available for immediate download at:
   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

 See the CHANGES.txt file included with the release for a full list of
 details.

 Solr 4.8.0 Release Highlights:

 * Apache Solr now requires Java 7 or greater (recommended is
   Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions
   have known JVM bugs affecting Solr).

 * Apache Solr is fully compatible with Java 8.

 * fields and types tags have been deprecated from schema.xml.
   There is no longer any reason to keep them in the schema file,
   they may be safely removed. This allows intermixing of fieldType,
   field and copyField definitions if desired.

 * The new {!complexphrase} query parser supports wildcards, ORs etc.
   inside Phrase Queries.

 * New Collections API CLUSTERSTATUS action reports the status of
   collections, shards, and replicas, and also lists collection
   aliases and cluster properties.

 * Added managed synonym and stopword filter factories, which enable
   synonym and stopword lists to be dynamically managed via REST API.

 * JSON updates now support nested child documents, enabling {!child}
   and {!parent} block join queries.

 * Added ExpandComponent to expand results collapsed by the
   CollapsingQParserPlugin, as well as the parent/child relationship
   of nested child documents.

 * Long-running Collections API tasks can now be executed
   asynchronously; the new REQUESTSTATUS action provides status.

 * Added a hl.qparser parameter to allow you to define a query parser
   for hl.q highlight queries.

 * In Solr single-node mode, cores can now be created using named
   configsets.

 * New DocExpirationUpdateProcessorFactory supports computing an
   expiration date for documents from the TTL expression, as well as
   automatically deleting expired documents on a periodic basis.

 Solr 4.8.0 also includes many other new features as well as numerous
 optimizations and bugfixes of the corresponding Apache Lucene release.

 Please report any feedback to the mailing lists
 (http://lucene.apache.org/solr/discussion.html)

 Note: The Apache Software Foundation uses an extensive mirroring network
 for distributing releases.  It is possible that the mirror you are using
 may not have replicated the release yet.  If that is the case, please
 try another mirror.  This also goes for Maven access.

 -
 Uwe Schindler
 uschind...@apache.org
 Apache Lucene PMC Chair / Committer
 Bremen, Germany
 http://lucene.apache.org/






Re: DIH issues with 4.7.1

2014-04-26 Thread Rafał Kuć
Hello!

Look at the javadocs for both. The granularity of
System.currentTimeMillis() depend on the operating system, so it may
happen that calls to that method that are 1 millisecond away from each
other still return the same value. This is not the case with
System.nanoTime() -
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi
I have just compare the difference between the version 4.6.0 and 4.7.1.
 Notice that the time in the getConnection function   is declared with the
 System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
   Curious about the resson for the change.the benefit of it .Is it
 neccessory?
I have read the SOLR-5734 ,
 https://issues.apache.org/jira/browse/SOLR-5734
Do some google about the difference of currentTimeMillis and nano,but
 still can not figure out it.




 2014-04-26 2:24 GMT+08:00 Shawn Heisey s...@elyograg.org:

 On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:

 I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
 process that we are using takes 4x as long to complete.  The only odd
 thing I notice is when I enable debug logging for the dataimporthandler
 process, it appears that in the new version each sql query is resulting in
 a new connection opened through jdbcdatasource (log:
 http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
 the speed of running a full import?


 This is most likely the problem you are experiencing:

 https://issues.apache.org/jira/browse/SOLR-5954

 The fix will be in the new 4.8 version.  The release process for 4.8 is
 underway right now.  A second release candidate was required yesterday.  If
 no further problems are encountered, the release should be made around the
 middle of next week.  If problems are encountered, the release will be
 delayed.

 Here's something very important that has been mentioned before:  Solr 4.8
 will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
 current release from Oracle as I write this) is recommended as a minimum.

 If a 4.7.3 version is built, this is a fix that we should backport.

 Thanks,
 Shawn





Re: Error on startup

2014-04-24 Thread Rafał Kuć
Hello!

What are the start-up parameters for Tomcat?

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Running Solr 4.6.1, Tomcat 7.0.29, Zookeeper 3.4.6, Java 6

 I have 3 Tomcats running, each with their own Solr war, all on the same
 box, along with 5 ZK nodes.  It's a dev box.

 I can get the SolrCloud up and running, then use the Collections API to get
 everything going.  It's all fine until I stop Tomcat and restart it.  Then
 I get this error:

 ERROR - 2014-04-24 14:48:08.677;
 org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check
 solr/home property and the logs
 ERROR - 2014-04-24 14:48:08.705; org.apache.solr.common.SolrException;
 null:java.lang.IllegalArgumentException: Illegal directory:
 /data/solr1/test_shard1_replica3/conf
 at
 org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1331)
 at
 org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:1373)
 at
 org.apache.solr.cloud.ZkController.bootstrapConf(ZkController.java:1558)
 at
 org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:193)
 at
 org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:73)
 at
 org.apache.solr.core.CoreContainer.load(CoreContainer.java:208)
 at
 org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:183)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:133)
 at
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:269)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4650)
 at
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5306)
 at
 org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
 at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
 at
 org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

 From looking at the source, it's trying to upload the conf directories.
 When I use the API to pull the configs from ZK, it doesn't download a conf
 directory, and I think that's why the error is caused.

 Is there any way around this?  I'm using this dev box to get ready to move
 my production Solr instances up to the newer versions, but I'm worried that
 I'd run into this issue there.

 Any advice would be helpful, let me know if there's anything else I can
 provide to help debug this.

 Thanks!


 -- Chris



Re: Error on startup

2014-04-24 Thread Rafał Kuć
Hello!

You have bootstrap_conf=true. It uploads a configuration set for each
core in the solr.xml file. I would suggest dropping that parameter and
upload the configuration once, using the scripts provided with Solr.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 These get added to the startup of Tomcat:
  -DhostPort=8181 -Djetty.port=8181
 -DzkHost=localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2185
 -Dbootstrap_conf=true -Dport=8181 -DhostContext=solr
 -DzkClientTimeout=2


 -- Chris


 On Thu, Apr 24, 2014 at 11:41 AM, Rafał Kuć r@solr.pl wrote:

 Hello!

 What are the start-up parameters for Tomcat?

 --
 Regards,
  Rafał Kuć
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


  Running Solr 4.6.1, Tomcat 7.0.29, Zookeeper 3.4.6, Java 6

  I have 3 Tomcats running, each with their own Solr war, all on the same
  box, along with 5 ZK nodes.  It's a dev box.

  I can get the SolrCloud up and running, then use the Collections API to
 get
  everything going.  It's all fine until I stop Tomcat and restart it.
  Then
  I get this error:

  ERROR - 2014-04-24 14:48:08.677;
  org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check
  solr/home property and the logs
  ERROR - 2014-04-24 14:48:08.705; org.apache.solr.common.SolrException;
  null:java.lang.IllegalArgumentException: Illegal directory:
  /data/solr1/test_shard1_replica3/conf
  at
  org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1331)
  at
 
 org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:1373)
  at
  org.apache.solr.cloud.ZkController.bootstrapConf(ZkController.java:1558)
  at
  org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:193)
  at
  org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:73)
  at
  org.apache.solr.core.CoreContainer.load(CoreContainer.java:208)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:183)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:133)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:269)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
  at
 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4650)
  at
 
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5306)
  at
  org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
  at
 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
  at
  org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
  at
  org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
  at
 
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
  at
 
 org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
  at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)

  From looking at the source, it's trying to upload the conf directories.
  When I use the API to pull the configs from ZK, it doesn't download a
 conf
  directory, and I think that's why the error is caused.

  Is there any way around this?  I'm using this dev box to get ready to
 move
  my production Solr instances up to the newer versions, but I'm worried
 that
  I'd run into this issue there.

  Any advice would be helpful, let me know if there's anything else I can
  provide to help debug this.

  Thanks!


  -- Chris





Re: Transformation on a numeric field

2014-04-15 Thread Rafał Kuć
Hello!

You can achieve that using update processor, for example look here: 
http://wiki.apache.org/solr/ScriptUpdateProcessor

What you would have to do, in general, is create a script that would
take a value of the field, divide it by the 1000 and put it in another
field - the target numeric field.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi All,

 I am looking for a way to index a numeric field and its value
 divided by 1 000 into another numeric field.
 I thought about using a CopyField with a
 PatternReplaceFilterFactory to keep only the first few digits (cutting the 
 last three).

 Solr complains that I can not have an analysis chain on a numeric field:

 Core:
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 Plugin init failure for [schema.xml] fieldType truncated_salary:
 FieldType: TrieIntField (truncated_salary) does not support
 specifying an analyzer. Schema file is
 /data/solr/solr-no-cloud/Core1/schema.xml


 Is there a way to accomplish this ?

 Thanks



Re: Solr dosn't load index at startup: out of memory

2014-04-11 Thread Rafał Kuć
Hello!

Do you have warming queries defined? 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi,
 my solr (v. 4.5) after moths of work suddenly stopped to index: it responded
 at the query but didn't index anymore new data. Here the error message:
 ERROR - 2014-04-11 15:52:30.317; org.apache.solr.common.SolrException;
 java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot
 commit
 at
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2788)

 So, I restarted solr using more RAM (from 4GB until 8Gb) but now solr can't
 load the cores. Here the error message:
 ERROR - 2014-04-11 16:32:50.509;
 org.apache.solr.core.CoreContainer; Unable
 to create core: posts
 org.apache.solr.common.SolrException: Error Instantiating Update Handler,
 solr.DirectUpdateHandler2 failed to instantiate
 org.apache.solr.update.UpdateHandler
 ...
 Caused by: java.lang.OutOfMemoryError: Java heap space

 Anyone can help me?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-dosn-t-load-index-at-startup-out-of-memory-tp4130665.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud 4.6.1 hanging

2014-03-29 Thread Rafał Kuć
Hello!

I've created SOLR-5935, please let me know if some more information is
needed. I'll be glad to help on the issue.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 I'm looking into a hang as well - not sure of it involves searching
 as well, but it may. Can you file a JIRA issue - let's track it down.

 - Mark

 On Mar 28, 2014, at 8:07 PM, Rafał Kuć r@solr.pl wrote:
 
 Hello!
 
 I have an issue with one of the SolrCloud deployments and I wanted to
 ask maybe someone had a similar issue. Six machines, a collection with
 6 shards with a replication factor of 3. It all runs on 6 physical
 servers, each with 24 cores. We've indexed about 32 milion documents
 and everything was fine until that point.
 
 Now, during performance tests, we run into an issue - SolrCloud hangs
 when querying and indexing is run at the same time. First we see a
 normal load on the machines, than the load starts to drop and thread
 dump shown numerous threads like this:
 
 Thread 12624: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
 may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
 line=186 (Compiled frame)
 - 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
  @bci=42, line=2043 (Compiled frame)
 - org.apache.http.pool.PoolEntryFuture.await(java.util.Date) @bci=50, 
 line=131 (Compiled frame)
 - 
 org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(java.lang.Object, 
 java.lang.Object, long, java.util.concurrent.TimeUnit, 
 org.apache.http.pool.PoolEntryFuture) @bci=431, line=281 (Compiled frame)
 - 
 org.apache.http.pool.AbstractConnPool.access$000(org.apache.http.pool.AbstractConnPool,
  java.lang.Object, java.lang.Object, long, java.util.concurrent.TimeUnit, 
 org.apache.http.pool.PoolEntryFuture) @bci=8, line=62 (Compiled frame)
 - org.apache.http.pool.AbstractConnPool$2.getPoolEntry(long, 
 java.util.concurrent.TimeUnit) @bci=15, line=176 (Compiled frame)
 - org.apache.http.pool.AbstractConnPool$2.getPoolEntry(long, 
 java.util.concurrent.TimeUnit) @bci=3, line=169 (Compiled frame)
 - org.apache.http.pool.PoolEntryFuture.get(long, 
 java.util.concurrent.TimeUnit) @bci=38, line=100 (Compiled frame)
 - 
 org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(java.util.concurrent.Future,
  long, java.util.concurrent.TimeUnit) @bci=4, line=212 (Compiled frame)
 - 
 org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(long,
  java.util.concurrent.TimeUnit) @bci=10, line=199 (Compiled frame)
 - 
 org.apache.http.impl.client.DefaultRequestDirector.execute(org.apache.http.HttpHost,
  org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext) 
 @bci=259, line=456 (Compiled frame)
 - 
 org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.HttpHost,
  org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext) 
 @bci=344, line=906 (Compiled frame)
 - 
 org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest,
  org.apache.http.protocol.HttpContext) @bci=21, line=805 (Compiled frame)
 - 
 org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest)
  @bci=6, line=784 (Compiled frame)
 - 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest,
  org.apache.solr.client.solrj.ResponseParser) @bci=1175, line=395 
 (Interpreted frame)
 - 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest)
  @bci=17, line=199 (Compiled frame)
 - 
 org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(org.apache.solr.client.solrj.impl.LBHttpSolrServer$Req)
  @bci=132, line=285 (Interpreted frame)
 - 
 org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(org.apache.solr.client.solrj.request.QueryRequest,
  java.util.List) @bci=13, line=214 (Compiled frame)
 - org.apache.solr.handler.component.HttpShardHandler$1.call() @bci=246, 
 line=161 (Compiled frame)
 - org.apache.solr.handler.component.HttpShardHandler$1.call() @bci=1, 
 line=118 (Interpreted frame)
 - java.util.concurrent.FutureTask$Sync.innerRun() @bci=29, line=334 
 (Interpreted frame)
 - java.util.concurrent.FutureTask.run() @bci=4, line=166 (Compiled frame)
 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=471 
 (Interpreted frame)
 - java.util.concurrent.FutureTask$Sync.innerRun() @bci=29, line=334 
 (Interpreted frame)
 - java.util.concurrent.FutureTask.run() @bci=4, line=166 (Compiled frame)
 - 
 java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
  @bci=95, line=1145 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 
 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=724 (Interpreted

SolrCloud 4.6.1 hanging

2014-03-28 Thread Rafał Kuć
Hello!

I have an issue with one of the SolrCloud deployments and I wanted to
ask maybe someone had a similar issue. Six machines, a collection with
6 shards with a replication factor of 3. It all runs on 6 physical
servers, each with 24 cores. We've indexed about 32 milion documents
and everything was fine until that point.

Now, during performance tests, we run into an issue - SolrCloud hangs
when querying and indexing is run at the same time. First we see a
normal load on the machines, than the load starts to drop and thread
dump shown numerous threads like this:

Thread 12624: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
line=186 (Compiled frame)
 - 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
@bci=42, line=2043 (Compiled frame)
 - org.apache.http.pool.PoolEntryFuture.await(java.util.Date) @bci=50, line=131 
(Compiled frame)
 - org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(java.lang.Object, 
java.lang.Object, long, java.util.concurrent.TimeUnit, 
org.apache.http.pool.PoolEntryFuture) @bci=431, line=281 (Compiled frame)
 - 
org.apache.http.pool.AbstractConnPool.access$000(org.apache.http.pool.AbstractConnPool,
 java.lang.Object, java.lang.Object, long, java.util.concurrent.TimeUnit, 
org.apache.http.pool.PoolEntryFuture) @bci=8, line=62 (Compiled frame)
 - org.apache.http.pool.AbstractConnPool$2.getPoolEntry(long, 
java.util.concurrent.TimeUnit) @bci=15, line=176 (Compiled frame)
 - org.apache.http.pool.AbstractConnPool$2.getPoolEntry(long, 
java.util.concurrent.TimeUnit) @bci=3, line=169 (Compiled frame)
 - org.apache.http.pool.PoolEntryFuture.get(long, 
java.util.concurrent.TimeUnit) @bci=38, line=100 (Compiled frame)
 - 
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(java.util.concurrent.Future,
 long, java.util.concurrent.TimeUnit) @bci=4, line=212 (Compiled frame)
 - 
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(long, 
java.util.concurrent.TimeUnit) @bci=10, line=199 (Compiled frame)
 - 
org.apache.http.impl.client.DefaultRequestDirector.execute(org.apache.http.HttpHost,
 org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext) @bci=259, 
line=456 (Compiled frame)
 - 
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.HttpHost,
 org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext) @bci=344, 
line=906 (Compiled frame)
 - 
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest,
 org.apache.http.protocol.HttpContext) @bci=21, line=805 (Compiled frame)
 - 
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest)
 @bci=6, line=784 (Compiled frame)
 - 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest,
 org.apache.solr.client.solrj.ResponseParser) @bci=1175, line=395 (Interpreted 
frame)
 - 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest)
 @bci=17, line=199 (Compiled frame)
 - 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(org.apache.solr.client.solrj.impl.LBHttpSolrServer$Req)
 @bci=132, line=285 (Interpreted frame)
 - 
org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(org.apache.solr.client.solrj.request.QueryRequest,
 java.util.List) @bci=13, line=214 (Compiled frame)
 - org.apache.solr.handler.component.HttpShardHandler$1.call() @bci=246, 
line=161 (Compiled frame)
 - org.apache.solr.handler.component.HttpShardHandler$1.call() @bci=1, line=118 
(Interpreted frame)
 - java.util.concurrent.FutureTask$Sync.innerRun() @bci=29, line=334 
(Interpreted frame)
 - java.util.concurrent.FutureTask.run() @bci=4, line=166 (Compiled frame)
 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=471 
(Interpreted frame)
 - java.util.concurrent.FutureTask$Sync.innerRun() @bci=29, line=334 
(Interpreted frame)
 - java.util.concurrent.FutureTask.run() @bci=4, line=166 (Compiled frame)
 - 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
 @bci=95, line=1145 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 
(Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=724 (Interpreted frame)

I've checked I/O statistics, GC working, memory usage, networking and
all of that - those resources are not exhausted during the test.

Hard autocommit is set to 15 seconds with openSearcher=false and
softAutocommit to 4 hours. We have a fairly high query rate, but until
we start indexing everything runs smooth.

Has anyone encountered similar behavior?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch



Re: stored=true vs stored=false, in terms of storage

2014-03-26 Thread Rafał Kuć
Hello!

Stored fields, the one with stored=true, mean that the original value
of the field is stored and can be retrieved with the document in
search results. Fields marked as stored=false can't be retrieved
during searching. That is the difference.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi,

 I am using Solr and I have one doubt.

 If any field has stored=false, does it mean that this fields is stored in
 disk and not in main memory. and this will be loaded whenever asked.


 The scenario I would like to handle this, In my case there are lots of
 information which I need to show when debugQuery=true, so i can take the
 latency hit on debugQuery=true.

 Can i save all the information in a field with indexed=false and
 stored=true.

 And how do normally DebugInformation is saved


 Regards,
 Pramod Negi



Re: organize folder inside Solr

2014-03-08 Thread Rafał Kuć
Hello!

There are multiple tutorials on how to do this, for example:

1. http://solr.pl/en/2011/03/21/solr-and-tika-integration-part-1-basics/
2. http://wiki.apache.org/solr/ExtractingRequestHandler

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Thanks, I followed it carefully, 
 the example in the tutorial is indexing only Xml files, and that is my
 problem,
 I want my search engine to look for other formats like pictures, music, PDF,
 and so on.
 and I'm working just on  Collection1 ,

 Med. 



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122242.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using numeric ranges in Solr query

2014-02-12 Thread Rafał Kuć
Hello!

Just specify the left boundary, like: price:[900 TO 1000]

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 When user enter a price in price field, for Ex: 1000 USD, i want to fetch all
 items with price around 1000 USD. I found in documentation that i can use
 price:[* to 1000] like that. It will get all items with from 1 to 1000 USD.
 But i want to get results where price is between 900 to 1000 USD. 

 Any help is appreciated.

 Thank You.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-numeric-ranges-in-Solr-query-tp4116941.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Realtimeget SolrCloud

2014-01-31 Thread Rafał Kuć
Hello!

Do you have realtime get handler defined in your solrconfig.xml? This
part should be present:

  requestHandler name=/get class=solr.RealTimeGetHandler
 lst name=defaults
   str name=omitHeadertrue/str
   str name=wtjson/str
   str name=indenttrue/str
 /lst
  /requestHandler

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hello,

 I am currently experimenting to move our Solr instance into a SolrCloud
 setup. 
 I am getting an error trying to access the realtimeget handlers:

 HTTP ERROR 404
 Problem accessing /solr/collection1/get. Reason:
 Not Found

 They work fine in the normal Solr setup. Do I need some changes in
 configurations? Am I missing something? Or is it not supported in the cloud
 environment?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Realtimeget-SolrCloud-tp4114595.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Realtimeget SolrCloud

2014-01-31 Thread Rafał Kuć
Hello!

No problem. Also remember that you need the _version_ field to be
present in your schema.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/



 That seemed to be the issue.

 I had several other request handlers as I wasn't using the simple /get, but
 apparently in SolrCloud this handler must be present in order to use the
 class RealTimeGetHandler at all.

 Thank you!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Realtimeget-SolrCloud-tp4114595p4114598.html
 Sent from the Solr - User mailing list archive at Nabble.com.



ODP: How to get phrase recipe working?

2014-01-21 Thread Rafał Kuć
Hello,

Phrase search will work on any analyzed field if you will use the  to sorround 
your phrase.

Are you looking for something specific or a standard phrase search?

Also - if you are looking for exact phrade search, you may want to remove 
Snowball filter.

Rafał Kuć

 Oryginalna wiadomość 
Od: eShard zim...@yahoo.com 
Data:21.01.2014  18:11  (GMT+01:00) 
Do: solr-user@lucene.apache.org 
Temat: How to get phrase recipe working? 

Good morning,
  In  the Apache Solr 4 cookbook, p 112 there is a recipe for setting up
phrase searches; like so:
fieldType name=text_phrase class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English/
  /analyzer
/fieldType

I ran a sample query q=text_ph:a-z index and it didn't work very well at
all.
Is there a better way to do phrase searches? 
I need a specific configuration to follow/use.
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-phrase-recipe-working-tp4112484.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with starting solr

2014-01-12 Thread Rafał Kuć
Hello!

If you want to use Morfologik you need to include it jar files,
because they are in included in the standard distribution. Look at the
downloaded Solr distribution, you should find the
contrib/analysis-extras/lib directory there. Get those libraries, put
it in a directory (ie: /var/solr/ext) add a section like this to your
solrconfig.xml (you can find it in conf directory of Solr):

lib dir=/var/solr/lib regex=.*\.jar /

In addition to that you'll need the
lucene-analyzers-morfologik-4.6.0.jar from
contrib/analysis-extras/lucene-lib. Just put it the same directory.

After that restart Solr and you should be OK.

Btw - you can find the instructions (although a bit old) at:
http://solr.pl/2012/04/02/solr-4-0-i-mozliwosci-analizy-jezyka-polskiego/

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi.
 I'm total beginner to Solr. 
 When I'm trying to start it I get following errors:

 lukasz@lukasz-VirtualBox:~/PycharmProjects/solr/solr-4.6.0/example$ java
 -jar start.jar 
 0[main] INFO  org.eclipse.jetty.server.Server  – jetty-8.1.10.v20130312
 104  [main] INFO 
 org.eclipse.jetty.deploy.providers.ScanningAppProvider  –
 Deployment monitor
 /home/lukasz/PycharmProjects/solr/solr-4.6.0/example/contexts at interval 0
 214  [main] INFO  org.eclipse.jetty.deploy.DeploymentManager  – Deployable
 added:
 /home/lukasz/PycharmProjects/solr/solr-4.6.0/example/contexts/solr-jetty-context.xml
 9566 [main] INFO 
 org.eclipse.jetty.webapp.StandardDescriptorProcessor  – NO
 JSP Support for /solr, did not find
 org.apache.jasper.servlet.JspServlet
 9746 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  –
 SolrDispatchFilter.init()
 9805 [main] INFO  org.apache.solr.core.SolrResourceLoader  – JNDI not
 configured for solr (NoInitialContextEx)
 9808 [main] INFO  org.apache.solr.core.SolrResourceLoader  – solr home
 defaulted to 'solr/' (could not find system property or JNDI)
 9809 [main] INFO  org.apache.solr.core.SolrResourceLoader  – new
 SolrResourceLoader for directory: 'solr/'
 10116 [main] INFO  org.apache.solr.core.ConfigSolr  – Loading container
 configuration from
 /home/lukasz/PycharmProjects/solr/solr-4.6.0/example/solr/solr.xml
 10406 [main] INFO  org.apache.solr.core.CoreContainer  – New CoreContainer
 30060911
 10407 [main] INFO  org.apache.solr.core.CoreContainer  – Loading cores into
 CoreContainer [instanceDir=solr/]
 10479 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting socketTimeout to: 0
 10479 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting urlScheme to: http://
 10480 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting connTimeout to: 0
 10481 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting maxConnectionsPerHost to: 20
 10483 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting corePoolSize to: 0
 10485 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting maximumPoolSize to: 2147483647
 10486 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting maxThreadIdleTime to: 5
 10487 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting sizeOfQueue to: -1
 10488 [main] INFO 
 org.apache.solr.handler.component.HttpShardHandlerFactory 
 – Setting fairnessPolicy to: false
 10702 [main] INFO  org.apache.solr.logging.LogWatcher  – SLF4J impl is
 org.slf4j.impl.Log4jLoggerFactory
 10704 [main] INFO  org.apache.solr.logging.LogWatcher  – Registering Log
 Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
 10839 [coreLoadExecutor-3-thread-1] INFO 
 org.apache.solr.core.CoreContainer 
 – Creating SolrCore 'content' using instanceDir: solr/content
 10839 [coreLoadExecutor-3-thread-1] INFO 
 org.apache.solr.core.SolrResourceLoader  – new SolrResourceLoader for
 directory: 'solr/content/'
 10850 [coreLoadExecutor-3-thread-1] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/home/lukasz/PycharmProjects/solr/solr-4.6.0/example/solr/content/lib/lucene-analyzers-morfologik-4.6.0.jar'
 to classloader
 10855 [coreLoadExecutor-3-thread-1] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/home/lukasz/PycharmProjects/solr/solr-4.6.0/example/solr/content/lib/morfologik-fsa-1.8.1.jar'
 to classloader
 10857 [coreLoadExecutor-3-thread-1] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/home/lukasz/PycharmProjects/solr/solr-4.6.0/example/solr/content/lib/morfologik-polish-1.8.1.jar'
 to classloader
 10859 [coreLoadExecutor-3-thread-1] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/home/lukasz/PycharmProjects/solr/solr-4.6.0/example/solr/content/lib/morfologik-stemming-1.8.1.jar'
 to classloader
 11005 [coreLoadExecutor-3-thread-1] INFO 
 org.apache.solr.core.SolrConfig  –
 Adding specified lib dirs

Re: Solr Query Slowliness

2013-12-26 Thread Rafał Kuć
Hello!

Could you tell us more about your scripts? What they do? If the
queries are the same? How many results you fetch with your scripts and
so on.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi all,

 I have multiple python scripts querying solr with the sunburnt module.

 Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
  840 GB storage) and contained several cores for different usage.

 When I manually executed a query through Solr Admin (a query containing
 10~15 terms, with some of them having boosts over one field and limited to
 one result without any sorting or faceting etc ) it takes around 700
 ms, and the Core contained 7 million documents.

 When the scripts are executed things get slower, my query takes 7~10s.

 Then what I did is to turn to SolrCloud expecting huge performance increase.

 I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
 with 28 ECU, 15 GB memory  160 SSD storage), then I created one collection
 to contain the core I was querying, I sharded it to 25 shards (each node
 containing 5 shards without replication), each shards took 54 MB of storage.

 Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
 is very good !

 Tested my scripts again (I have 30 scripts running at the same time), and
 as a surprise, things run fast for 5 seconds then it turns realy slow again
 (query time ).

 I updated the solrconfig.xml to remove the query caches (I don't need them
 since queries are very different and only 1 time queries) and changes the
 index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

 Any ideas ?

 PS: My index size will not stay with 7m documents, it will grow to +100m
 and that may get things worse



Re: Solr Query Slowliness

2013-12-26 Thread Rafał Kuć
Hello!

Different queries can have different execution time, that's why I
asked about the details. When running the scripts, is Solr CPU fully
utilized? To tell more I would like to see what queries are run
against Solr from scripts.

Do you have any information on network throughput between the server
you are running scripts on and the Solr cluster? You wrote that the
scripts are fine for 5 seconds and than they get slow. If your Solr
cluster is not fully utilized I would take a look at the queries and
what they return (ie. using faceting with facet.limit=-1) and seeing
if the network is able to process those. 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Thanks Rafal for your reply,

 My scripts are running on other independent machines so they does not
 affect Solr, I did mention that the queries are not the same (that is why I
 removed the query cache from solrconfig.xml), and I only get 1 result from
 Solr (which is the top scored one so no sorting since it is by default
 ordred by score)



 2013/12/26 Rafał Kuć r@solr.pl

 Hello!

 Could you tell us more about your scripts? What they do? If the
 queries are the same? How many results you fetch with your scripts and
 so on.

 --
 Regards,
  Rafał Kuć
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


  Hi all,

  I have multiple python scripts querying solr with the sunburnt module.

  Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
 memory
   840 GB storage) and contained several cores for different usage.

  When I manually executed a query through Solr Admin (a query containing
  10~15 terms, with some of them having boosts over one field and limited
 to
  one result without any sorting or faceting etc ) it takes around 700
  ms, and the Core contained 7 million documents.

  When the scripts are executed things get slower, my query takes 7~10s.

  Then what I did is to turn to SolrCloud expecting huge performance
 increase.

  I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
  with 28 ECU, 15 GB memory  160 SSD storage), then I created one
 collection
  to contain the core I was querying, I sharded it to 25 shards (each node
  containing 5 shards without replication), each shards took 54 MB of
 storage.

  Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
  is very good !

  Tested my scripts again (I have 30 scripts running at the same time), and
  as a surprise, things run fast for 5 seconds then it turns realy slow
 again
  (query time ).

  I updated the solrconfig.xml to remove the query caches (I don't need
 them
  since queries are very different and only 1 time queries) and changes the
  index memory to 1 GB, but only got a small increase (3~4s for each query
 ?!)

  Any ideas ?

  PS: My index size will not stay with 7m documents, it will grow to +100m
  and that may get things worse





Re: Solr Query Slowliness

2013-12-26 Thread Rafał Kuć
Hello!

It seems that the number of queries per second generated by your
scripts may be too much for your Solr cluster to handle with the
latency you want.

Try launching your scripts one by one and see what is the bottle neck
with your instance. I assume that for some number of scripts running
at the same time you will have good performance and it will start to
degrade after you start adding even more.

If you don't have high commit rate and you don't need NRT, disabling
the caches shouldn't be needed and they can help with query
performance.

Also there are tools our there that can help you diagnose what the
actual problem is, for example (http://sematext.com/spm/index.html). 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 This an example of a query:

 http://myip:8080/solr/TestCatMatch_shard12_replica1/select?q=Royal+Cashmere+RC+106+CS+Silk+Cashmere+V+Neck+Moss+Green+Men
 ^10+s+Sweater+Cashmere^3+Men^3+Sweaters^3+Clothing^3rows=1wt=jsonindent=true

 in return :

 {
   responseHeader:{
 status:0,
 QTime:191},
  
 response:{numFound:4539784,start:0,maxScore:2.0123534,docs:[
   {
 Sections:fashion,
 IdsCategories:11101911,
 IdProduct:ef6b8d7cf8340d0c8935727a07baebab,
 Id:11101911-ef6b8d7cf8340d0c8935727a07baebab,
 Name:Uniqlo Men Cashmere V Neck Sweater Men Clothing
 Sweaters Cashmere,
 _version_:1455419757424541696}]
   }}

 This query was executed when no script is running so the QTime is only
 191 ms, but it may take up to 3s when they are)


 Of course it can be smaller or bigger and of course that affects the
 execution time (the execution times I spoke of are the internal ones
 returned by solr, not calculated by me).

 And yes the CPU is fully used.


 2013/12/26 Rafał Kuć r@solr.pl

 Hello!

 Different queries can have different execution time, that's why I
 asked about the details. When running the scripts, is Solr CPU fully
 utilized? To tell more I would like to see what queries are run
 against Solr from scripts.

 Do you have any information on network throughput between the server
 you are running scripts on and the Solr cluster? You wrote that the
 scripts are fine for 5 seconds and than they get slow. If your Solr
 cluster is not fully utilized I would take a look at the queries and
 what they return (ie. using faceting with facet.limit=-1) and seeing
 if the network is able to process those.

 --
 Regards,
  Rafał Kuć
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


  Thanks Rafal for your reply,

  My scripts are running on other independent machines so they does not
  affect Solr, I did mention that the queries are not the same (that is
 why I
  removed the query cache from solrconfig.xml), and I only get 1 result
 from
  Solr (which is the top scored one so no sorting since it is by default
  ordred by score)



  2013/12/26 Rafał Kuć r@solr.pl

  Hello!
 
  Could you tell us more about your scripts? What they do? If the
  queries are the same? How many results you fetch with your scripts and
  so on.
 
  --
  Regards,
   Rafał Kuć
  Performance Monitoring * Log Analytics * Search Analytics
  Solr  Elasticsearch Support * http://sematext.com/
 
 
   Hi all,
 
   I have multiple python scripts querying solr with the sunburnt module.
 
   Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
  memory
840 GB storage) and contained several cores for different usage.
 
   When I manually executed a query through Solr Admin (a query
 containing
   10~15 terms, with some of them having boosts over one field and
 limited
  to
   one result without any sorting or faceting etc ) it takes around
 700
   ms, and the Core contained 7 million documents.
 
   When the scripts are executed things get slower, my query takes 7~10s.
 
   Then what I did is to turn to SolrCloud expecting huge performance
  increase.
 
   I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8
 vCPU
   with 28 ECU, 15 GB memory  160 SSD storage), then I created one
  collection
   to contain the core I was querying, I sharded it to 25 shards (each
 node
   containing 5 shards without replication), each shards took 54 MB of
  storage.
 
   Tested my query on the new SolrCloud, it takes 70 ms ! huge increase
 wich
   is very good !
 
   Tested my scripts again (I have 30 scripts running at the same time),
 and
   as a surprise, things run fast for 5 seconds then it turns realy slow
  again
   (query time ).
 
   I updated the solrconfig.xml to remove the query caches (I don't need
  them
   since queries are very different and only 1 time queries) and changes
 the
   index memory to 1 GB, but only got a small increase (3~4s for each
 query
  ?!)
 
   Any ideas ?
 
   PS: My index size will not stay with 7m documents, it will grow to
 +100m
   and that may get things worse
 
 





Re: subscribe for this maillist

2013-12-12 Thread Rafał Kuć
Hello!

To subscribe please send a mail to solr-user-subscr...@lucene.apache.org

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 I want to subscribe for this solr mailing list.

 Thanks and Best Regards,

 Gabriel Zhang



Re: What is the difference between attorney:(Roger Miller) and attorney:Roger Miller

2013-11-19 Thread Rafał Kuć
Hello!

In the first one, the two terms 'Roger' and 'Miller' are run against
the attorney field. In the second the 'Roger' term is run against the
attorney field and the 'Miller' term is run against the default search
field.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 We got different results for these two queries. The first one returned 115
 records and the second returns 179 records.

 Thanks,

 Fudong



Re: What is the difference between attorney:(Roger Miller) and attorney:Roger Miller

2013-11-19 Thread Rafał Kuć
Hello!

Terms surrounded by  characters will be treated as phrase query. So,
if your default query operator is OR, the attorney:(Roger Miller) will
result in documents with first or second (or both) terms in the
attorney field. The attorney:Roger Miller will result only in
documents that have the phrase Roger Miller in the attorney field.

You may want to look at Lucene query syntax to understand all the
differences: 
http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Also, attorney:(Roger Miller) is same as attorney:Roger Miller right? Or
 the term Roger Miller is run against attorney?

 Thanks,
 -Utkarsh


 On Tue, Nov 19, 2013 at 12:42 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 In the first one, the two terms 'Roger' and 'Miller' are run against
 the attorney field. In the second the 'Roger' term is run against the
 attorney field and the 'Miller' term is run against the default search
 field.

 --
 Regards,
  Rafał Kuć
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


  We got different results for these two queries. The first one returned
 115
  records and the second returns 179 records.

  Thanks,

  Fudong






Re: Solr Synonym issue

2013-11-14 Thread Rafał Kuć
Hello!

Could you please describe the issue you are having?

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/



Hi Team,

I had implemented solr with my magento enterprise edition. I am trying to 
implemented synonyms in solr but its not working.Please find attached for the 
synonyms.txt,schema.xml and solrconf.xml file.

Since 2 days i am debugging the issue yet not finding any solution.  
Please help me as soon as possible.

Hope i will get your reply as soon as possible.

Thanks  Regards
Jyoti Kadam



Re: Stop solr service

2013-10-27 Thread Rafał Kuć
Hello!

Could you please write more about what you want to do? Do you need to
stop running Solr process. If yes what you need to do is stop the
container (Jetty/Tomcat) that Solr runs in. You can also kill JVM
running Solr, however it will be usually enough to just stop the
container. 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi Team,

 Pla stop the solr service.



Re: Stop solr service

2013-10-27 Thread Rafał Kuć
Hello!

Please look at the Solr user list section of
http://lucene.apache.org/solr/discussion.html on how to unsubscribe
from the list.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 I want to stop the mail


 On Sun, Oct 27, 2013 at 4:37 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 Could you please write more about what you want to do? Do you need to
 stop running Solr process. If yes what you need to do is stop the
 container (Jetty/Tomcat) that Solr runs in. You can also kill JVM
 running Solr, however it will be usually enough to just stop the
 container.

 --
 Regards,
  Rafał Kuć
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


  Hi Team,

  Pla stop the solr service.






Re: How to configure solr to our java project in eclipse

2013-10-27 Thread Rafał Kuć
Hello!

If you really want to go embedded way, please look at 
http://wiki.apache.org/solr/EmbeddedSolr

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 Hi friends,Iam giridhar.please clarify my doubt.

 we are using solr for our project.the problem the solr is outside of our
 project( in another folder)

 we have to manually type java -start.jar to start the solr and use that
 services.

 But what we need is,when we run the project,the solr should be automatically
 start.

 our project is a java project with tomcat in eclipse.

 How can i achieve this.

 Please help me.

 Thankyou.
 Giridhar



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-configure-solr-to-our-java-project-in-eclipse-tp4097954.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 4.6.0 latest build

2013-10-22 Thread Rafał Kuć
Hello!

The current development version of Solr can be found in the SVN
repository - https://lucene.apache.org/solr/versioncontrol.html

You need to download the source code and build Solr yourself. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Hi!

 I'm using Solr 4.4.0 currently and I'm having quite some trouble with 
 * SOLR-5216: Document updates to SolrCloud can cause a distributed deadlock.
   (Mark Miller)
 which should be fixed for 4.6.0.

 Where could I get Solr 4.6.0 from? I want to make some tests regarding this
 fix.
 Thank you!



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-6-0-latest-build-tp4096960.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 4.6.0 latest build

2013-10-22 Thread Rafał Kuć
Hello!

Thanks for correction :)

Did you report that the issue still persists?

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 That's not quite right.

 You don't have to download the source or build it yourself. You can grab a
 4.6 snapshot build from here:

 https://builds.apache.org/job/Solr-Artifacts-4.x/lastSuccessfulBuild/artifact/solr/package/

 I have an issue with updates no longer being processed, and the snapshots
 have not resolved the issue for me.

 Cheers,
 Chris

 On Tuesday, October 22, 2013, Rafał Kuć wrote:

 Hello!

 The current development version of Solr can be found in the SVN
 repository - https://lucene.apache.org/solr/versioncontrol.html

 You need to download the source code and build Solr yourself.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

  Hi!

  I'm using Solr 4.4.0 currently and I'm having quite some trouble with
  * SOLR-5216: Document updates to SolrCloud can cause a distributed
 deadlock.
(Mark Miller)
  which should be fixed for 4.6.0.

  Where could I get Solr 4.6.0 from? I want to make some tests regarding
 this
  fix.
  Thank you!



  -
  Thanks,
  Michael
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Solr-4-6-0-latest-build-tp4096960.html
  Sent from the Solr - User mailing list archive at Nabble.com.





Re: Seeking New Moderators for solr-user@lucene

2013-10-18 Thread Rafał Kuć
Hello!

I can help with moderation. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch


 It looks like it's time to inject some fresh blood into the 
 solr-user@lucene moderation team.

 If you'd like to volunteer to be a moderator, please reply back to this
 thread and specify which email address you'd like to use as a moderator
 (if different from the one you use when sending the email)

 Being a moderator is really easy: you'll get a some extra emails in your
 inbox with MODERATE in the subject, which you skim to see if they are spam
 -- if they are you delete them, if not you reply all to let them get
 sent to the list, and authorize that person to send future messages w/o
 moderation.

 Occasionally, you'll see an explicit email to solr-user-owner@lucene from
 a user asking for help realted to their subscription (usually 
 unsubscribing problems) and you and the other moderators chime in with
 assistance when possible.

 More details can be found here...

 https://wiki.apache.org/solr/MailingListModeratorInfo

 (I'll wait ~72+ hours to see who responds, and then file the appropriate
 jira with INFRA)


 -Hoss



Re: faceting from multiValued field

2013-09-10 Thread Rafał Kuć
Hello!

Your field needs to be indexed in order for faceting to work.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Hi,

 I am having a problem with multiValued field and Faceting

 This is the schema:
 field name=series type=string indexed=false stored=true
 required=false omitTermFreqAndPositions=true multiValued=true /

 all I get is:
 lst name=series /


 Note: the data is correctly placed in the field as the query results shows.
 However, the facet is not working.

 Could anyone tell me how to achieve it?

 Thanks a lot.




Re: change schema.xml without bringing down the solr

2013-08-29 Thread Rafał Kuć
Hello!

You can upload new schema.xml (along with other configuration files)
and reload the collection using collections API
(http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API).
However you have remember that in order for the new field to be usable
documents needs to be re-indexed.

You can't just add new shard (you can add replicas though), but you
can re-shard your current collection. Look at the collections API and
the SPLITSHARD command, don't know which Solr you are using, but the
SPLITSHARD is supported since Solr 4.3.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Hi,

 I have couple of usecase  need to be implemented.

 1. Out application is 24 X 7 , so we require search feature to be available
 24 X 7.
 How to add new field to live solr node , without brining down the solr
 instance ?

 2. How to additional shard  to existing collection and re distribute the
 index ?

 Thanks,
 ~sathish




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/change-schema-xml-without-bringing-down-the-solr-tp4087184.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Concat 2 fields in another field

2013-08-27 Thread Rafał Kuć
Hello!

You don't have to write custom component - you can use
ScriptUpdateProcessor - http://wiki.apache.org/solr/ScriptUpdateProcessor

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Hello all ,

 I am using solr 4.x , I have a requirement where I need to have a field
 which holds data from 2 fields concatenated using _. So for example I have 2
 fields firstName and lastName , I want a third field which should hold
 firstName_lastName. Is there any existing concatenating component available
 or I need to write a custom updateProcessor which does this task. By the way
 need for having this third field is that I want to group on the
 firstname,lastName but as grouping does not support multiple fields to form
 single group I am using this trick. Hope I am clear . 

 Thanks .




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: JSON Update create different copies of the same document

2013-08-08 Thread Rafał Kuć
Hello!

Do you have the unique identifier specified in the schema.xml ?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Hello,

 I thought that by adding a new document with the same id on Solr the
 document already on Solr would be updated with the new info. But both
 documents appear on the search results... How can I update a document?

 Regards
 Bruno Santos



Re: Problems installing Solr4 in Jetty9

2013-08-08 Thread Rafał Kuć
Hello!

Could you look at the logs and post what you find there?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 I've been unable to install SOLR into Jetty. Jetty seems to be running fine,
 and this is the steps I took to install solr:

 # SOLR
 cd /opt
 wget -O - $SOLR_URL | tar -xzf -
 cp solr-4.4.0/dist/solr-4.4.0.war /opt/jetty/webapps/solr.war
 cp -R solr-4.4.0/example/solr /opt/
 cp -R /opt/solr-4.4.0/dist/ /opt/solr/
 cp -R /opt/solr-4.4.0/contrib/ /opt/solr/
 rm -rf /opt/solr-4.4.0
 echo 'JAVA_OPTIONS=-Dsolr.solr.home=/opt/solr $JAVA_OPTIONS'
 /etc/default/jetty 
 chown -R jetty:jetty /opt/solr/
 /etc/init.d/jetty restart
 When I browse to the directory it says:

 HTTP ERROR: 503 Problem accessing /solr. Reason: Service Unavailable Powered
 by Jetty://

 Is anyone able to tell me of any steps that I have missed for installing it,
 or how to diagnose it?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Problems-installing-Solr4-in-Jetty9-tp4083209.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Enabling DIH breaks Solr4.4

2013-08-08 Thread Rafał Kuć
Hello!

Try changing the lib directives to absolute paths, by looking at the
exception:

Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.dataimport.DataImportHandler

it seems that the DataImportHandler class is not seen by Solr
class loader.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Hi,

 I'm a bit stuck here. I had Solr4.4 working without too many issues. I
 wanted to enable the DIH so I firstly added these lines to the
 solrconfig.xml:

 lib dir=../contrib/dataimporthandler/lib regex=.*\.jar /
 lib dir=../dist/ regex=apache-solr-dataimporthandler-.*\.jar /

 Restarted and everything was fine. Then I added these lines to the config as
 well which broke Solr:

 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
   str
 name=config/opt/solr/collection1/conf/data-config.xml/str
 /lst
   /requestHandler

 This is the error it gives me:

 HTTP ERROR 500

 Problem accessing /solr/. Reason:

 {msg=SolrCore 'collection1' is not available due to init failure:
 RequestHandler init
 failure,trace=org.apache.solr.common.SolrException:
 SolrCore 'collection1' is not available due to init failure: RequestHandler
 init failure
 at
 org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:540)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1094)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1028)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
 at
 org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:317)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
 at org.eclipse.jetty.server.Server.handle(Server.java:445)
 at
 org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:267)
 at
 org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:224)
 at
 org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
 at java.lang.Thread.run(Thread.java:724)
 Caused by: org.apache.solr.common.SolrException: RequestHandler init failure
 at org.apache.solr.core.SolrCore.init(SolrCore.java:835)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:629)
 at
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
 at
 org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
 at
 org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 ... 1 more
 Caused by: org.apache.solr.common.SolrException: RequestHandler init failure
 at
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:167)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:772)
 ... 13 more
 Caused by: org.apache.solr.common.SolrException: Error loading class

Re: Solr search on a large text field is very slow

2013-08-08 Thread Rafał Kuć
Hello!

In general, wildcard queries can be expensive. If you need wildcard
queries, you can try the EdgeNGram - 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Index size is around 150 GB and there are around 6.5 million documents in the
 index. Search on a specific text field is very slow, it takes 1 minute to 2
 minute for wildcard queries like *test*  with no highlighting and no facets
 This field contributes to 90% of index size.
 This is my shema.xml

 fieldType name=text_pl class=solr.TextField
   analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
   /analyzer
 /fieldType

  /types


  fields
  
  field name=syndrome type=text_pl indexed=true stored=true
 required=true multivalued=false omitNorms=false/
  field name=test_file_result_id type=long indexed=true stored=true
 omitNorms=true multivalued=false/
  field name=start_date type=date indexed=true stored=true
 omitNorms=true multivalued=false/
  field name=job_id type=long indexed=true stored=true
 omitNorms=true multivalued=false/
  field name=test_run_id type=long indexed=true stored=true
 omitNorms=true multivalued=false/
  field name=cluster type=string indexed=true stored=true
 omitNorms=true multivalued=false/
  field name=logfile type=text_ws indexed=true stored=true
 omitNorms=true multivalued=false/


 I am using DIH for indexing data from Database. Largest syndrome field
 size is 5MB it can range from 5MB to 1KB

 I tried using whitespacetokeniser with not much luck, 
 I am using solr3.6.0
 Indexing takes 1.5 hours.

 Please let me know , if I need to add anything to improve the search speed.

 Thanks
 Meena







 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-search-on-a-large-text-field-is-very-slow-tp4083310.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud replication server and Master Server

2013-08-01 Thread Rafał Kuć
Hello!

There is no master - slave in SolrCloud. You can create a collection
that has only a single shard and have one replica. When you send an
indexing request it will be forwarded to the leader shard (in your
case you want it to be the instance running on 8080), however
both Solr instances will accept indexing requests.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 I have a requirement to set solrcloud with 2 instances of Solr( one on 8080
 and otehr on 9090 ports respectivey)  and a Zookeeper ensembe( 3 modes).
 Can I have one solr instance as a Master and the other as a replcia of the
 Master.

 Because, when i set up a solrcloud and index to one of the solr running on
 8080, it automatically routes to 9090 solr also.

 I need indexing only on 8080 and 9090 solr should be a replica of 8080solr.
 Is this possible in Cloud.'

 Pls guide me.



Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2013-08-01 Thread Rafał Kuć
Hello!

The exception you've shown tells you that Solr tried to allocate an
array that exceeded heap size. Do you use some custom sorts? Did you
send large bulks during the time that the exception occurred?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Today I found in solr logs exception: java.lang.OutOfMemoryError: Requested
 array size exceeds VM limit.
 At that time memory usage was ~200MB / Xmx3g

 Env looks like this:
 3x standalone zK (Java 7)
 3x Solr 4.2.1 on Tomcat (Java 7)
 Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux
 One Solr and one ZK on single host: lmsiprse01, lmsiprse02, lmsiprse03

 Before exception I start restarting Solr:
 lmsiprse01:
 [2013-08-01 05:23:43]: /etc/init.d/tomcat6-1 stop
 [2013-08-01 05:25:09]: /etc/init.d/tomcat6-1 start
 lmsiprse02 (leader):
 2013-08-01 05:27:21]: /etc/init.d/tomcat6-1 stop
 2013-08-01 05:29:31]: /etc/init.d/tomcat6-1 start
 lmsiprse03:
 [2013-08-01 05:25:48]: /etc/init.d/tomcat6-1 stop
 [2013-08-01 05:26:42]: /etc/init.d/tomcat6-1 start

 and error shows up at 2013 5:27:26 on lmsiprse01

 Is anybody knows what happened?

 fragment of log looks:
 sie 01, 2013 5:27:26 AM org.apache.solr.core.SolrCore execute
 INFO: [products] webapp=/solr path=/select
 params={facet=truestart=0q=facet.limit=-1facet.field=attribute_u-typfacet.field=attribute_u-gama-kolorystycznafacet.field=brand_namewt=javabinfq=node_id:1056version=2rows=0}
 hits=1241 status=0 QTime=33
 sie 01, 2013 5:27:26 AM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError:
 Requested array size exceeds VM limit
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
 at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:724)
 Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
 at org.apache.lucene.util.PriorityQueue.init(PriorityQueue.java:64)
 at org.apache.lucene.util.PriorityQueue.init(PriorityQueue.java:37)
 at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue.init(ShardDoc.java:113)
 at
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:766)
 at
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:625)
 at
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:604)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 ... 13 more



Re: SolrCloud replication server and Master Server

2013-08-01 Thread Rafał Kuć
Hello!

Take a look at http://wiki.apache.org/solr/SolrCloud - when creating a
collection you can specify the maxShardsPerNode parameter, which
allows you to control how many shards of a given collection is
permitted to be placed on a single node. By default it is set to 1,
try setting it to 2 when creating your collection.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 I am rephrasing my question,

 Currently what i haev is

Solr 8080 - Shard1Shard2- Replica
Solr 9090 - Shard 2 Shard 1 - Replica

 Can we have something like

Solr 8080 - Shard 1, Shard 2
solr 9090 -  Shard2-Replica , Shard1- Replica



 On Thu, Aug 1, 2013 at 6:40 PM, Prasi S prasi1...@gmail.com wrote:

 Here when i create a single shard and a replica, then my shard will be on
 one server and replcia in teh other isn't ?


 On Thu, Aug 1, 2013 at 6:29 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 There is no master - slave in SolrCloud. You can create a collection
 that has only a single shard and have one replica. When you send an
 indexing request it will be forwarded to the leader shard (in your
 case you want it to be the instance running on 8080), however
 both Solr instances will accept indexing requests.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

  I have a requirement to set solrcloud with 2 instances of Solr( one on
 8080
  and otehr on 9090 ports respectivey)  and a Zookeeper ensembe( 3 modes).
  Can I have one solr instance as a Master and the other as a replcia of
 the
  Master.

  Because, when i set up a solrcloud and index to one of the solr running
 on
  8080, it automatically routes to 9090 solr also.

  I need indexing only on 8080 and 9090 solr should be a replica of
 8080solr.
  Is this possible in Cloud.'

  Pls guide me.






Re: AND Queries

2013-07-29 Thread Rafał Kuć
Hello!

Try turning on debugQuery and see what I happening. From what I see
you are searching the en term in lang field, the book term in url
field and the pencil and cat terms in the default search field, but
from your second query I see that you would like to find the last two
terms in the url.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 I am searching for a keyword as like that:

 lang:en AND url:book pencil cat

 It returns me results however none of them includes both book, pencil and
 cat keywords. How should I rewrite my query?

 I tried this:

 lang:en AND url:(book AND pencil AND cat)

 and looks like OK. However this not:


 lang:en AND url:book AND pencil AND cat

 why?



Re: Changing the number of shards?

2013-07-06 Thread Rafał Kuć
Hello!

Just to be perfectly clear here - Solr has advantage over
ElasticSearch because it can split the live index. However, this
sentence from Jack's mail is not true:

In other words, even though you can easily “add shards” on ES,
those are really just replicas of existing primary shards and cannot
be new primary shard

The replicas can become new primary shards if something happens to a
primary shard - this is similar to what SolrCloud supports. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Correct.
 ES currently does not let you change the number of shards after you've
 created an Index (Collection in SolrCloud).
 It does not let you split shards either.  SolrCloud has an advantage
 over ES around this at this point.

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Fri, Jul 5, 2013 at 6:51 PM, Jack Krupansky j...@basetechnology.com 
 wrote:
 According to the ElasticSearch glossary, “You cannot change the number of 
 primary shards in an index, once the index is created.” Really? Is that 
 true? (A “primary shard” is what Solr calls a shard, or slice.)

 In other words, even though you can easily “add shards” on ES, those are 
 really just replicas of existing primary shards and cannot be new primary 
 shards – if I understand that statement correctly. Besides the simple fact 
 that if a primary shard goes down, a non-primary shard can be promoted to 
 being a primary shard (“a replica shard can be promoted to a primary shard 
 if the primary fails.”)

 See:
 http://www.elasticsearch.org/guide/reference/glossary/

 My understanding is that with Solr you can do things like split shards and 
 change the number of shards per node. Is this an advantage that ES does not 
 offer?

 Any ES experts want to comment?

 -- Jack Krupansky



Re: change solr core schema and config via http

2013-06-28 Thread Rafał Kuć
Hello!

In 4.3.1 you can only read schema.xml or portions of it using Schema
API (https://issues.apache.org/jira/browse/SOLR-4658). It is a start
to allow schema.xml modifications using HTTP API, which will be a
functionality of next release of Solr -
https://issues.apache.org/jira/browse/SOLR-3251

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Hi,

 I am trying to figure out how to change the schema/config of an
 existing core or a core to be created via http calls to solr.  After
 spending hours in searching online, I still could not find any
 documents showing me how to do it.

 The only way I know is that you have to log on to the solr host and
 then create/modify the schema/config files on the host. It seem
 surprising to me that we can create new core via the http interface
 but are not able to change the schema/config in the same way.

 Can anyone give me some hints? Thanks.

 Regards,

 James



Re: dynamic field

2013-06-17 Thread Rafał Kuć
Hello!

Dynamic field is just a regular field from Lucene point of view, so
its content will be treated just like the content of other fields. The
difference in on the Solr level.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 How is daynamic field in solr implemented?  Does it get saved into the same
 Document as other regular fields in lucene index?

 Ming-



Re: Solr 3.5 Optimization takes index file size almost double

2013-06-13 Thread Rafał Kuć
Hello!

Optimize command needs to rewrite the segments, so while it is
still working you may see the index size to be doubled. However after
it is finished the index size will be usually lowered comparing to the
index size before optimize.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Hi,
 I have solr server 1.4.1 with index file size 428GB.Now When I upgrade solr
 Server 1.4.1 to Solr 3.5.0 by replication method. Size remains same.
 But when optimize index for Solr 3.5.0 instance its size reaches 791GB.so
 what is solutions for size remains same or lesser.
 I optimize Solr 3.5 with Query:
 solr3.5 url/update?optimize=truecommit=true

 Thanks  regards
 Viresh Modi



Re: Solr 3.5 Optimization takes index file size almost double

2013-06-13 Thread Rafał Kuć
Hello!

Do you have some backup after commit in your configuration? It would
also be good to see how your index directory looks like, can you list
that ?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

 Thanks Rafal for reply...

 I agree with you. But Actually After optimization , it does not reduce size
 and it remains double. so is there any thing we missed or need to do for
 achieving index size reduction ?

 Is there any special setting we need to configure for replication?




 On 13 June 2013 16:53, Rafał Kuć r@solr.pl wrote:

 Hello!

 Optimize command needs to rewrite the segments, so while it is
 still working you may see the index size to be doubled. However after
 it is finished the index size will be usually lowered comparing to the
 index size before optimize.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

  Hi,
  I have solr server 1.4.1 with index file size 428GB.Now When I upgrade
 solr
  Server 1.4.1 to Solr 3.5.0 by replication method. Size remains same.
  But when optimize index for Solr 3.5.0 instance its size reaches 791GB.so
  what is solutions for size remains same or lesser.
  I optimize Solr 3.5 with Query:
  solr3.5 url/update?optimize=truecommit=true

  Thanks  regards
  Viresh Modi





Re: FW: Solr and Lucene

2013-06-12 Thread Rafał Kuć
Hello!

Solr 4.2.1 is using Lucene 4.2.1. Basically Solr and Lucene are
currently using the same numbers after their development was merged.

As far for Luke I think that the last version is using beta or alpha
release of Lucene 4.0. I would try replacing Lucene jar's and see if
it works although I didn't try it.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi,

 Which lucene version is used with Solr 4.2.1? And is it possible to open it
 by luke? If not by any other tool? Thanks

 Thanks



Re: /non/existent/dir/yields/warning

2013-06-03 Thread Rafał Kuć
Hello!

You should remove that entry from your solrconfig.xml file. It is
something like this:

  lib dir=/non/existent/dir/yields/warning / 


-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi,

 I am constantly getting this error in my solr log:

 Can't find (or read) directory to add to classloader:
 /non/existent/dir/yields/warning (resolved as:
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

 Anyone got any idea on how to solve this




Re: /non/existent/dir/yields/warning

2013-06-03 Thread Rafał Kuć
Hello!

That's a good question. I suppose its there to show users how to setup
a custom path to libraries.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 ok thanks :)

 But why was it there anyway? I mean it says in comments:
 If a 'dir' option (with or without a regex) is used and nothing
 is found that matches, a warning will be logged.

 So it looks like a kind of exception handling or logging for libs not
 found... so shouldnt this folder actually exist?





 On Mon, Jun 3, 2013 at 2:06 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 You should remove that entry from your solrconfig.xml file. It is
 something like this:

   lib dir=/non/existent/dir/yields/warning /


 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,

  I am constantly getting this error in my solr log:

  Can't find (or read) directory to add to classloader:
  /non/existent/dir/yields/warning (resolved as:
 
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

  Anyone got any idea on how to solve this







Re: Solr/Lucene Analayzer That Writes To File

2013-05-27 Thread Rafał Kuć
Hello!

Take a look at custom posting formats. For example
here  is  a  nice  post showing what you can do with Lucene SimpleText
codec:
http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html

However  please  remember  that it is not advised to use that codec in
production environment.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi;

 I want to use Solr for an academical research. One step of my purpose is I
 want to store tokens in a file (I will store it at a database later) and I
 don't want to index them. For such kind of purposes should I use core
 Lucene or Solr? Is there an example for writing a custom analyzer and just
 storing tokens in a file?



Re: search filter

2013-05-22 Thread Rafał Kuć
Hello!

You can try sending a filter like this fq=Salary:[5+TO+10]+OR+Salary:0

It should work

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Dear All
 Can I write a search filter for a field having a value in a range or a
 specific value.

 Say if I want to have a filter like
 1. Select profiles with salary 5 to 10  or Salary 0.

 So I expect profiles having salary either 0 , 5, 6, 7, 8, 9, 10 etc.

 It should be possible, can somebody help me with syntax of 'fq' filter
 please.

 Best Regards
 kamal



Re: Mandatory words search in SOLR

2013-05-13 Thread Rafał Kuć
Hello!

Change  the  default  query  operator. For example add the q.op=AND to
your query.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi SOLR Experts
 When I search documents with keyword as *java, mysql* then I get the
 documents containing either *java* or *mysql* or both.

 Is it possible to get the documents those contains both *java* and *mysql*.

 In that case, how the query would look like.

 Thanks a lot
 Kamal



Re: Indexing Point Number

2013-05-08 Thread Rafał Kuć
Hello!

Use a float field type in your schema.xml file, for example like this:
fieldType name=float class=solr.TrieFloatField precisionStep=0 
positionIncrementGap=0/

Define a field using this type:
field name=price  type=float indexed=true stored=true/

You'll be able to index data like this:
field name=price19.95/field

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,
  how can I indexing numbers with decimal point.
  For example:
  5,50
  109,90

  I want to sort the numbers.

  Thanks

  



Re: Prediction About Index Sizes of Solr

2013-04-08 Thread Rafał Kuć
Hello!

Let me answer the first part of your question. Please have a look at
https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls
It should help you make an estimation about your index size. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 This may not be a well detailed question but I will try to make it clear.

 I am crawling web pages and will index them at SolrCloud 4.2. What I want
 to predict is the index size.

 I will have approximately 2 billion web pages and I consider each of them
 will be 100 Kb.
 I know that it depends on storing documents, stop words. etc. etc. If you
 want to ask about detail of my question I may give you more explanation.
 However there should be some analysis to help me because I should predict
 something about what will be the index size for me.

 On the other hand my other important question is how SolrCloud makes
 replicas for indexes, can I change it how many replicas will be. Because I
 should multiply the total amount of index size with replica size.

 Here I found an article related to my analysis:
 http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/

 I know this question may not be details but if you give ideas about it you
 are welcome.



Re: Don't cache filter queries

2013-03-21 Thread Rafał Kuć
Hello!

Just add {!cache=false} to the filter in your query
(http://wiki.apache.org/solr/SolrCaching#filterCache).

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 I need to use the filter query feature to filter my results, but I
 don't want the results cached as documents are added to the index
 several times per second and the results will be state immediately. Is
 there any way to disable filter query caching?

 This is on Solr 4.1 running in Jetty on Ubuntu Server. Thanks.



Re: Search data who does not have x field

2013-03-13 Thread Rafał Kuć
Hello!

You can for example add a filter, that will remove the documents that
doesn't have a value in a field, like:

fq=!category:[*+TO+*]

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi all,

 I am facing a problem. 

 Problem is:

 I have updated 250 data to solr. 


 and some of data have category field and some of don't have.

 for example.


 {
 id:321,
 name:anurag,
 category:x
 },
 {
 id:3,
 name:john
 }


 now i want to search that data who does not have that field. 

 what query should like. 


 please reply 

 It is very urgent - i have to complete this task by today itself 

 thanks in advance.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-data-who-does-not-have-x-field-tp4046959.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Rafał Kuć
Hello!

I suppose the only way to make this work will be reindexing the data.
Solr 4.1 uses Lucene 4.1 as you know, which introduced new default
codec with stored fields compression and this is one of the reasons
you can't read that index with 4.0.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Solr 4.1 has been giving up much trouble rejecting documents indexed.
 While I try to work my way through this, I would like to move our
 application back to Solr 4.0. However, now when I try to start Solr
 with same index that was created with Solr 4.0 but has been running on
 4.1 few a few days I get this error chain:

 org.apache.solr.common.SolrException: Error opening new searcher
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 Caused by: java.lang.IllegalArgumentException: A SPI class of type
 org.apache.lucene.codecs.Codec with name 'Lucene41' does not exist.
 You need to add the corresponding JAR file supporting this SPI to your
 classpath.The current classpath supports the following names:
 [Lucene40, Lucene3x]

 Obviously I'll not be installing Lucene41 in Solr 4.0, but is there
 any way to work around this? Note that neither solrconf.xml nor
 schema.xml have changed. Thanks.




Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Rafał Kuć
Hello!

As far as I know you have to re-index using external tool.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 On Fri, Mar 1, 2013 at 11:59 AM, Rafał Kuć r@solr.pl wrote:
 Hello!

 I assumed that re-indexing can be painful in your case, if it wouldn't
 you probably would re-index by now :) I guess (didn't test it myself),
 that you can create another collection inside your cluster, use the
 old codec for Lucene 4.0 (setting the version in solrconfig.xml should
 be enough) and re-indexing, but still re-indexing will have to be
 done. Or maybe someone knows a better way ?


 Will I have to reindex via an external script bridging, such as a
 Python script which requests N documents at a time, indexes them into
 Solr 4.1, then requests another N documents to index? Or is there
 internal Solr / Lucene facility for this? I've actually looked for
 such a facility, but as I am unable to find such a thing I ask.




Re: solr 4.1 - trying to create 2 collection with 2 different sets of configurations

2013-02-28 Thread Rafał Kuć
Hello!

You can try doing the following:

1. Run Solr with no collection and no cores, just an empty solr.xml

2. If you don't have a ZooKeeper run Solr with -DzkRun

3. Upload you configurations to ZooKeeper, by running

cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:9983 -confdir 
CONFIGURATION_1_DIR -confname COLLECTION_1_NAME

and

cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:9983 -confdir 
CONFIGURATION_2_DIR -confname COLLECTION_2_NAME

4. Create those two collections:

curl 
'http://localhost:8983/solr/admin/collections?action=CREATEname=COLLECTION_1_NAMEnumShards=2replicationFactor=0'

and

curl 
'http://localhost:8983/solr/admin/collections?action=CREATEname=COLLECTION_2_NAMEnumShards=2replicationFactor=0'

Of course the CONFIGURATION_1_DIR and CONFIGURATION_2_DIR is the
directory where your configurations are stored and the
COLLECTION_1_NAME and the COLLECTION_2_NAME are your collection names.

Also adjust the numShards and replicationFactor to your needs.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 solr 4.1 - trying to create 2 collection with 2 different sets of
 configurations.
 Anyone accomplished this?

 if I run bootstraop twice on different conf dirs, I get both of them in
 zookeper, but using collections API to create a collection if
 collection.configName=seconfConf doesnt work.

 any idea?

 thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-4-1-trying-to-create-2-collection-with-2-different-sets-of-configurations-tp4043609.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 4.1 - trying to create 2 collection with 2 different sets of configurations

2013-02-28 Thread Rafał Kuć
Hello!

Yes I did :)

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Thanks
 I'm going to try this. 

 Have you tried it yourself?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-4-1-trying-to-create-2-collection-with-2-different-sets-of-configurations-tp4043609p4043617.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: id field doesn't match

2013-02-25 Thread Rafał Kuć
Hello!

If you what you need is an exact match, try using the simple string
type. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi all,

 i have an id field wich always contains a string with that schema 
 vw-200130315-
 Wich field type and settings should i use to get exactly this id as a result.
 Actually i always get more then one result.

 Kind regards
 Benjamin



Re: Is solr reindex all documents for each commit?

2013-02-20 Thread Rafał Kuć
Hello!

You don't need to re-index all the documents, you only need to index
the new ones.


-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi 

   We have a requirement  to add 10 or 100 new documents to solr on a daily
 basis. Our solr application has 3 million documents. So whenever i add new
 documents, do i need to reindex all documents(3 million)?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-solr-reindex-all-documents-for-each-commit-tp4041470.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexed And Stored

2013-02-12 Thread Rafał Kuć
Hello!

The simplest way will be updating your schema.xml file, do the change
that needs to be done and fully re-index your data. Solr wont be able
to automatically change not indexed field to indexed one.

You could also use the partial document update API of Solr if you
don't have your original data, however there are a few limitations. In
order to fully reconstruct your documents you would have to have all
the fields stored in your index. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 hello, 

 in my schema

 field name=city_name type=text_general indexed=false stored=true/

 and i updated 18 data. 


 now i need indexed=true for all old data. 

 i need solution  

 please someone help me out. 

 please reply urgent!!
 thanks  



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexed-And-Stored-tp4039893.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Maximum Number of Records In Index

2013-02-07 Thread Rafał Kuć
Hello!

Right, my bad - ids are still using int32. However, that still
gives us 2,147,483,648 possible identifiers right per single index,
which is not close to the 13,5 millions mentioned in the first mail.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Rafal,

 What about docnums, don't they are limited by int32 ?
 07.02.2013 15:33 пользователь Rafał Kuć r@solr.pl написал:

 Hello!

 Practically there is no limit in how many documents can be stored in a
 single index. In your case, as you are using Solr from 2011, there is
 a limitation regarding the number of unique terms per Lucene segment
 (
 http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/fileformats.html#Limitations
 ).
 However I don't think you've hit that. Solr by itself doesn't remove
 documents unless told to do so.

 Its hard to guess what can be the reason and as you said, you see
 updates coming to your handler. Maybe new documents have the same
 identifiers that the ones that are already indexed ? As I said, this
 is only a guess and we would need to have more information. Are there
 any exceptions in the logs ? Do you run delete command ? Are your
 index files changed ? How do you run commit ?

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  I have searched this forum but not yet found a definitive answer, I
 think the
  answer is There is No Limit depends on server specification. But never
 the
  less I will say what I have seen and then ask the questions.

  From scratch (November 2011) I have set up our SOLR which contains data
 from
  various sources, since March 2012 , the number of indexed records (unique
  ID's) reached 13.5 million , which was to be expected. However for the
 last
  8 months the number of records in the index has not gone above 13.5
 million,
  yet looking at the request handler outputs I can safely say at least
  anywhere from 50 thousand to 100 thousand records are being indexed
 daily.
  So I am assuming that earlier records are being removed, and I do not
 want
  that.

  Question: If there is a limit to the number of records the index can
 store
  where do I find this and change it?
  Question: If there is no limit does anyone have any idea why for the last
  months the number has not gone beyond 13.5 million, I can safely say
 that at
  least 90% are new records.

  thanks

  macroman



  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Maximum-Number-of-Records-In-Index-tp4038961.html
  Sent from the Solr - User mailing list archive at Nabble.com.





Re: Error setting up SOLR with Tomcat on Windows

2013-01-28 Thread Rafał Kuć
Hello!

What Gora says is valid. Just point the SimplePostTools to the correct
host, core name and handler and it should work just fine, of course if
Solr is up and running. For example the following works just fine with
Solr running on 8080:

java -Durl=http://localhost:8080/solr/collection1/update -jar post.jar *.xml

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 (a) I am not using jetty, I am using tomcat as mentioned in the subject line
 (b) I am using some other port number and that I have included in the url

 On Monday, January 28, 2013, Gora Mohanty g...@mimirtech.com wrote:
 On 28 January 2013 16:11, Neha Jatav neha.ja...@gmail.com wrote:
 Dear Gora Mohanty,

 I am not using solr-url literally. I am using the localhost url slash
 solr.

 Again, please read the documentation. As mentioned earlier: (a) You
 do not need -Durl=... if using built-in Jetty, (b) and the URL should
 include
 the 8983 port number.

 Regards,
 Gora




Re: Does solr 4.1 support field compression?

2013-01-24 Thread Rafał Kuć
Hello!

It should be turned on by default, because the stored fields
compression is the behavior of the default Lucene 4.1 codec.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi everyone,

 I didn't see any mention of field compression in the release notes for
 Solr 4.1. Did the ability to automatically compress fields end up
 getting added to this release?

 Thanks!,
 Ken



Re: ResultSet Solr

2013-01-23 Thread Rafał Kuć
Hello!

Maybe you are looking to get the results in plain text if you want to
remove all the XML tags ? If that so, you can try adding wt=csv to get
the response as CSV instead of XML.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Thanks. half problem solved. is there anyway i can get rid of the rest:
 response, numFound, start, doc?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729p4035733.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: ResultSet Solr

2013-01-23 Thread Rafał Kuć
Hello!

As far as I know you can't remove the response, numFound, start and
docs. This is how the response is prepared by Solr and apart from
removing the header, you can't do anything. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 no I wanted it in json. i want it to start from where square bracket starts [
 . I want to remove everything before that. I can get it in json by including
 wt=json. I just want to remove Response, numFound, start and docs. 



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729p4035748.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: curl with dynamic url not working

2013-01-21 Thread Rafał Kuć
Hello!

Try surrounding you URL with ' characters, so the whole command looks
like this:

curl
'http://127.0.0.1:8080/solr/newsRSS_DIH?command=full-importamp;clean=falseamp;url=http://www.example.com/news'


-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 While running the following curl command, the url never went with params
 while invoking:

 curl
 http://127.0.0.1:8080/solr/newsRSS_DIH?command=full-importamp;clean=falseamp;url=http://www.example.com/news

 LOG details:
 INFO: [collection1] webapp=/solr path=/newsRSS_DIH
 params={command=full-import} status=0 QTime=11 
 Jan 21, 2013 10:19:14 AM
 org.apache.solr.handler.dataimport.DataImporter
 doFullImport


 If I run the command in browser without curl, it works as expected and
 indexed properly. Please let me know any pointer to resolve this issue.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/curl-with-dynamic-url-not-working-tp4035092.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Priorities on fields

2013-01-16 Thread Rafał Kuć
Hello!

What do you mean by priority ? You can define index or query time
boost. However that will allow to specify the importance of such
field.

A good page to look at is: http://wiki.apache.org/solr/SolrRelevancyCookbook

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi,

 Is it possible to define priorities on fields?

 Lets say I have a product table which has the following fields:

 - id
 - title
 - description
 - code_name

 An entry could be like this:

 id: 42
 title: shinny new shoes
 description: Shinny new shoes made in Italy
 code_name: shinny-new-shoes-42-2013

 Now, I would like to priorities the fields for the search hint. I would
 like to do as follow:

 id: 0.0
 title: 0.8
 description: 0.5
 code_name: 0.1

 Is it possible in SOLR 3.6.1?

 Dariusz



Re: Calculate a sum.

2013-01-14 Thread Rafał Kuć
Hello!

Fetching all the documents, especially for a query that returns many
documents can be a pain.

However there is a StatsComponent
(http://wiki.apache.org/solr/StatsComponent) in Solr, however your
field would have to be numeric and indexed. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 hello.

 My problem is, that i need to calculate a sum of amounts. this amount is in
 my index (stored=true). my php script get all values with paging. but if a
 request takes too long, jetty is killing this process and i get a broken
 pipe.

 Which is the best/fastest way to get the values of many fields from index?
 exists an ResponseHandler for exports? Or which is the fastest?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Calculate-a-sum-tp4033091.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: defaultOperator in schema.xml

2013-01-09 Thread Rafał Kuć
Hello!

You should set the q.op parameter in your request handler
configuration in solrconfig.xml instead of using the default operator
from schema.xml.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 I'm testing out Solr 4.0.  I got the sample schema working, so now I'm
 converting my existing schema (from a Solr 3.4 instance), but I'm confused
 as to what to do for the defaultOperator setting for the solrQueryParser
 field.

 For my existing Solr, I have:
 solrQueryParser defaultOperator=AND/

 The 4.0 schema says that the defaultOperator field is deprecated, and seems
 to suggest that I just pass it along in my queries.  Is there no way I can
 set it to AND by default somewhere else?  I don't control all the
 applications that use my Solr Indexes, and I want to ensure that they
 operate with AND and not OR.

 Thanks!

 -- Chris



Re: No live SolrServers Solr 4 exceptions on trying to create a collection

2013-01-07 Thread Rafał Kuć
Hello!

Can you share the command you use to start all four Solr servers ?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Any clue to why this is happening will be greatly appreciated. This has 
 become a blocker for me.
 I can use the HTTPSolrServer to create a core/make requests etc, but then it 
 behaves like Solr 3.6
 http://host:port/solr/admin/cores and not
 http://host:port/solr/admin/collections

 With my setup (4 servers running at localhost 8983, 8900, 7574 and 7500) when 
 I manually do a
 http://127.0.0.1:7500/solr/admin/cores?action=CREATEname=myColl1instanceDir=defaultdataDir=myColl1Datacollection=myColl1numShards=2
 it creates the collection only at the 7500 server. This is similar
 to when I use HttpSolrServer (Solr 3.6 behavior).

 And of course when I initiate a 
 http://127.0.0.1:7500/solr/admin/collections?action=CREATEname=myColl2instanceDir=defaultdataDir=myColl2Datacollection=myColl2numShards=2
 as expected it creates the collection spread on 2 servers. I am
 failing to achieve the same with SolrJ. As in the code at the bottom
 of the mail, I use CloudSolrServer and get the No live SolrServers exception.

 Any help or direction will of how to create collections (using the
 collections API) using SolrJ will be highly appreciated.

 Regards
 Jay


 -Original Message-
 From: Jay Parashar [mailto:jparas...@itscape.com] 
 Sent: Sunday, January 06, 2013 7:42 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr 4 exceptions on trying to create a collection

 The exception No live SolrServers is being thrown when trying to
 create a new Collection ( code at end of this mail). On the
 CloudSolrServer request method, we have this line
 ClientUtils.appendMap(coll, slices, clusterState.getSlices(coll));
 where coll is the new collection I am trying to create and hence
 clusterState.getSlices(coll)); is returning null.
 And then the loop of the slices which adds to the urlList never
 happens and hence the LBHttpSolrServer created in the
 CloudSolrServer has a null url list in the constructor.
 This is giving the No live SolrServers exception.

 What I am missing?

 Instead of passing the CloudSolrServer to the create.process, if I
 pass the LBHttpSolrServer  (server.getLbServer()), the collection
 gets created but only on one server.

 My code to create a new Cloud Server and new Collection:-

 String[] urls =
 {http://127.0.0.1:8983/solr/,http://127.0.0.1:8900/solr/,http://127.0.0.1:7500/solr/,http://127.0.0.1:7574/solr/};
 CloudSolrServer server = new CloudSolrServer(127.0.0.1:2181, new
 LBHttpSolrServer(urls));
 server.getLbServer().getHttpClient().getParams().setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT,
 5000);
 server.getLbServer().getHttpClient().getParams().setParameter(CoreConnectionPNames.SO_TIMEOUT,
 2); server.setDefaultCollection(collectionName);
 server.connect();
 CoreAdminRequest.Create create = new CoreAdminRequest.Create();
 create.setCoreName(myColl); create.setCollection(myColl);
 create.setInstanceDir(defaultDir);
 create.setDataDir(myCollData);
 create.setNumShards(2);
 create.process(server); //Exception No live SolrServers  is thrown here


 Thanks
 Jay


 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
 Sent: Friday, January 04, 2013 6:08 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 4 exceptions on trying to create a collection

 Tried Wireshark yet to see what host/port it is trying to connect
 and why it fails? It is a complex tool, but well worth learning.

 Regards,
   Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening
 all at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD 
 book)


 On Fri, Jan 4, 2013 at 6:58 PM, Jay Parashar jparas...@itscape.com wrote:

 Thanks! I had a different version of httpclient in the classpath. So 
 the 2nd exception is gone but now I am  back to the first one 
 org.apache.solr.client.solrj.SolrServerException: No live SolrServers 
 available to handle this request

 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
 Sent: Friday, January 04, 2013 4:21 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 4 exceptions on trying to create a collection

 For the second one:

 Wrong version of library on a classpath or multiple versions of 
 library on the classpath which causes wrong classes with missing 
 fields/variables? Or library interface baked in and the implementation 
 is newer. Some sort of mismatch basically. Most probably in Apache http 
 library.

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all 
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book

Re: Is there any C API for Solr??

2013-01-02 Thread Rafał Kuć
Hello!

Although not C, but C++ there is a project aiming at this - 
http://code.google.com/p/solcpp/

However I don't know how usable that is, you can just make pure HTTP
calls and process the response. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi All,

 Is there any C API for Solr??

 Thanks and regards,
 Romita



Re: What do I need to research to solve the problem of returning good results for a generic term?

2012-12-28 Thread Rafał Kuć
Hello!

Ahmet I think it is available here: 
http://archive.apachecon.com/eu2012/presentations/07-Wednesday/L1R-Lucene/

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi David,

 Mikhail tackled a similar problem. 
  
 http://www.apachecon.eu/schedule/presentation/18/

 Where we can find full presentation?

 --- On Fri, 12/28/12, David Parks davidpark...@yahoo.com wrote:

 From: David Parks davidpark...@yahoo.com
 Subject: What do I need to research to solve the problem of returning good 
 results for a generic term?
 To: solr-user@lucene.apache.org
 Date: Friday, December 28, 2012, 11:40 AM
 I'm sure this is a complex problem
 requiring many iterations of work, so I'm
 just looking for pointers in the right direction of research
 here.
 
  
 
 I have a base term, such as let's say black dress that I
 might search for.
 Someone searching on this term is most logically looking for
 black dresses.
 In my dataset I have black dresses, but I also have many CDs
 with the term
 black dress in them (it's not so uncommon of a song
 title). 
 
  
 
 I would want the CDs to show up if I search for a more
 specific term like
 black dress CD, but I would want the black dresses to show
 up for the less
 specific term black dress.
 
  
 
 Google image search is excellent at handling this example. A
 pretty vanilla
 installation of Solr isn't yet great at it.
 
  
 
 So. just looking for a nudge in the right direction here.
 What should I go
 read up on first to start learning how to improve on these
 results?
 
  
 
  
 




Re: facet query

2012-12-21 Thread Rafał Kuć
Hello!

Try facet.mincount=1, that should help.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi, is there a way with facets to say, return facets that are not
 0? I have facet=truefacet.field=officefacet.field=name as my
 facet parameters, and with some of my queries it brings back people that have 
 a value of 0.
 Thanks



Re: Getting error. org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler

2012-12-18 Thread Rafał Kuć
Hello!

It seems that Tomcat can't see the the jar's with DataImportHandler.
Did you try adding a lib tag to your solrconfig.xml, ie.: something
like this:

lib dir=/usr/share/solr/lib/ regex=apache-solr-dataimporthandler-\d.*\.jar 
/

And putting the apache-solr-dataimporthandler-3.6.1.jar and
apache-solr-dataimporthandler-extras-3.6.1.jar to /usr/share/solr/lib
?


-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 I am new for solr. I have installed apache tomcat 7.0 on my server and I
 have solr 3.6.1 on server.

 I have solr-home folder set by network guys on my D:\ drive. The folders in
 that are: bin,etc,logs,multicore,webapps.

 In the multicore folder there are:
 core0,core1,exampledocs,README.txt and
 solr.xml. In webapps folder I have solr.war file nothing else.

 Now I keep one more core folder in multicore folder named
 ConfigUserTextUpdate which have conf folder in it and restart the tomcat
 service and I can see the new core on the localhost/solr.

 Now I add db-config.xml to the ConfigUserTextUpdate core. Below is the
 content:

 dataConfig
 dataSource
 type=JdbcDataSource
 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://dbname\servername;databaseName=dbname
 user=username
 password=password/
 document
 entity name=ConfigUserTextUpdate query=UserTextUpdate_Index
 /entity
 /document
 /dataConfig

 Upto here everything is fine and all the three cores are shown on
 localhost/solr. Now in the solrconfig.xml of the core ConfigUserTextUpdate
 I add the line

   requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
   str name=configdb-data-config.xml/str
 /lst
   /requestHandler

 And it starts giving the error as shown below:

 HTTP Status 500 - Severe errors in solr configuration. Check your log
 files for more detailed information on what may be wrong. If you

 want solr to continue after configuration errors, change: false in
 solr.xml -
 org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler' at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:394)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:419) at
 org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:455) at
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:159)
 at org.apache.solr.core.SolrCore.(SolrCore.java:563) at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:480) at
 org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) at
 org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
 at
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
 at
 org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4650)
 at
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5306)
 at
 org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
 at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:633) at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:977) at
 org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1655)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at
 java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at
 java.util.concurrent.FutureTask.run(Unknown Source) at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at
 java.lang.Thread.run(Unknown Source) Caused by:
 java.lang.ClassNotFoundException:
 org.apache.solr.handler.dataimport.DataImportHandler at
 java.net.URLClassLoader$1.run(Unknown Source) at
 java.security.AccessController.doPrivileged(Native Method) at
 java.net.URLClassLoader.findClass(Unknown Source) at
 java.lang.ClassLoader.loadClass(Unknown Source) at
 java.net.FactoryURLClassLoader.loadClass(Unknown Source) at
 java.lang.ClassLoader.loadClass(Unknown Source) at
 java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown
 Source) at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:378)
 ... 27 more

 I have

Re: user session id / cookie to record search query

2012-11-21 Thread Rafał Kuć
Hello!

You want it to be written into logs ? If that is the case you can just
add additional parameter, that is not recognized by Solr, for example
'userId' and send a query like this:

http://localhost:8983/solr/select?q=*:*userId=user1

In the logs you should see something like that:
INFO: [collection1] webapp=/solr path=/select
params={q=*:*userId=user1} hits=0 status=0 QTime=1

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi All,

 Do anyone have an idea how to use user session id / cookie to record 
 search query from that particular user. 

 Thanks and regards,
 Romita



Re: user session id / cookie to record search query

2012-11-21 Thread Rafał Kuć
Hello!

What Solr are you using ? If not 4.0, information on logging can be
found on wiki - http://wiki.apache.org/solr/SolrLogging

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hello Rafał Kuć

 Thanks a lot for you guidance. I am not quite sure how to i collect the
 logs. Could you please help.

 Romita 



 From:   Rafał Kuć r@solr.pl
 To: solr-user@lucene.apache.org, 
 Date:   11/21/2012 04:57 PM
 Subject:Re: user session id / cookie to record search query



 Hello!

 You want it to be written into logs ? If that is the case you can just
 add additional parameter, that is not recognized by Solr, for example
 'userId' and send a query like this:

 http://localhost:8983/solr/select?q=*:*userId=user1

 In the logs you should see something like that:
 INFO: [collection1] webapp=/solr path=/select
 params={q=*:*userId=user1} hits=0 status=0 QTime=1



Re: Pls help: Very long query - what to do?

2012-11-21 Thread Rafał Kuć
Hello!

If you really need a query that long, than one of the things is
increase the allowed header length in Jetty. Add the following to your
Jetty connector configuration:

Set name=headerBufferSize16384/Set

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 my query is like this, see below. I use already POST request.

 i got a solr exception:
 org.apache.solr.client.solrj.SolrServerException: Server at
 http://server:7056/solr returned non ok status:400, message:Bad Request

 is there a way in order to prevent this?

 id:(ModuleImpl@20117 OR ModuleImpl@37886 OR ModuleImpl@9379 OR
 ModuleImpl@37906 OR ModuleImpl@19969 OR ModuleImpl@37936 OR
 ModuleImpl@115568 OR ModuleImpl@19901 OR ModuleImpl@115472 OR
 ModuleImpl@20044 OR ModuleImpl@25168 OR ModuleImpl@38026 OR
 ModuleImpl@115647 OR ModuleImpl@115648 OR ModuleImpl@115649 OR
 ModuleImpl@20045 OR ModuleImpl@25169 OR ModuleImpl@38031 OR
 ModuleImpl@115650 OR ModuleImpl@21090 OR ModuleImpl@38037 OR
 ModuleImpl@117097 OR ModuleImpl@21091 OR ModuleImpl@38038 OR
 ModuleImpl@117098 OR ModuleImpl@117099 OR ModuleImpl@19973 OR
 ModuleImpl@38040 OR ModuleImpl@115571 OR ModuleImpl@115572 OR
 ModuleImpl@115573 OR ModuleImpl@21092 OR ModuleImpl@38135 OR
 ModuleImpl@117100 OR ModuleImpl@21093 OR ModuleImpl@38136 OR
 ModuleImpl@117101 OR ModuleImpl@117102 OR ModuleImpl@19979 OR
 ModuleImpl@38140 OR ModuleImpl@115581 OR ModuleImpl@19980 OR
 ModuleImpl@38143 OR ModuleImpl@115582 OR ModuleImpl@115583 OR
 ModuleImpl@21094 OR ModuleImpl@38223 OR ModuleImpl@117104 OR
 ModuleImpl@117105 OR ModuleImpl@117106 OR ModuleImpl@117107 OR
 ModuleImpl@117108 OR ModuleImpl@21095 OR ModuleImpl@38224 OR
 ModuleImpl@117109 OR ModuleImpl@19920 OR ModuleImpl@25157 OR
 ModuleImpl@38240 OR ModuleImpl@115493 OR ModuleImpl@20139 OR
 ModuleImpl@38286 OR ModuleImpl@115752 OR ModuleImpl@21096 OR
 ModuleImpl@38327 OR ModuleImpl@117111 OR ModuleImpl@117112 OR
 ModuleImpl@117113 OR ModuleImpl@21097 OR ModuleImpl@38328 OR
 ModuleImpl@117114 OR ModuleImpl@19989 OR ModuleImpl@25166 OR
 ModuleImpl@38332 OR ModuleImpl@115585 OR ModuleImpl@115586 OR
 ModuleImpl@19990 OR ModuleImpl@38339 OR ModuleImpl@115587 OR
 ModuleImpl@115588 OR ModuleImpl@115589 OR ModuleImpl@115590 OR
 ModuleImpl@115591 OR ModuleImpl@115592 OR ModuleImpl@115593 OR
 ModuleImpl@115594 OR ModuleImpl@115595 OR ModuleImpl@19807 OR
 ModuleImpl@38365 OR ModuleImpl@115365 OR ModuleImpl@115366 OR
 ModuleImpl@19808 OR ModuleImpl@38373 OR ModuleImpl@115367 OR
 ModuleImpl@115368 OR ModuleImpl@115369 OR ModuleImpl@115370 OR
 ModuleImpl@115371 OR ModuleImpl@21121 OR ModuleImpl@38418 OR
 ModuleImpl@117132 OR ModuleImpl@117133 OR ModuleImpl@117134 OR
 ModuleImpl@732 OR ModuleImpl@38438 OR ModuleImpl@117115 OR
 ModuleImpl@21099 OR ModuleImpl@38440 OR ModuleImpl@117116 OR
 ModuleImpl@19929 OR ModuleImpl@38450 OR ModuleImpl@115501 OR
 ModuleImpl@115502 OR ModuleImpl@19810 OR ModuleImpl@38471 OR
 ModuleImpl@115372 OR ModuleImpl@115373 OR ModuleImpl@21124 OR
 ModuleImpl@38529 OR ModuleImpl@117135 OR ModuleImpl@117136 OR
 ModuleImpl@117137 OR ModuleImpl@117138 OR ModuleImpl@19931 OR
 ModuleImpl@115505 OR ModuleImpl@21074 OR ModuleImpl@38546 OR
 ModuleImpl@117077 OR ModuleImpl@19934 OR ModuleImpl@38548 OR
 ModuleImpl@115507 OR ModuleImpl@115508 OR ModuleImpl@115509 OR
 ModuleImpl@115510 OR ModuleImpl@20550 OR ModuleImpl@38607 OR
 ModuleImpl@115885 OR ModuleImpl@21127 OR ModuleImpl@38638 OR
 ModuleImpl@117139 OR ModuleImpl@21077 OR ModuleImpl@25182 OR
 ModuleImpl@38657 OR ModuleImpl@117078 OR ModuleImpl@117079 OR
 ModuleImpl@117080 OR ModuleImpl@19938 OR ModuleImpl@38658 OR
 ModuleImpl@115516 OR ModuleImpl@115517 OR ModuleImpl@115518 OR
 ModuleImpl@115519 OR ModuleImpl@19864 OR ModuleImpl@115432 OR
 ModuleImpl@19769 OR ModuleImpl@38695 OR ModuleImpl@115320 OR
 ModuleImpl@20556 OR ModuleImpl@38720 OR ModuleImpl@20494 OR
 ModuleImpl@38736 OR ModuleImpl@19871 OR ModuleImpl@115438 OR
 ModuleImpl@21056 OR ModuleImpl@38771 OR ModuleImpl@19775 OR
 ModuleImpl@19776 OR ModuleImpl@38802 OR ModuleImpl@115330 OR
 ModuleImpl@115331 OR ModuleImpl@115332 OR ModuleImpl@20566 OR
 ModuleImpl@38835 OR ModuleImpl@115889 OR ModuleImpl@115890 OR
 ModuleImpl@20501 OR ModuleImpl@38846 OR ModuleImpl@115869 OR
 ModuleImpl@115870 OR ModuleImpl@21107 OR ModuleImpl@38859 OR
 ModuleImpl@117118 OR ModuleImpl@19879 OR ModuleImpl@38871 OR
 ModuleImpl@115444 OR ModuleImpl@115445 OR ModuleImpl@21058 OR
 ModuleImpl@38873 OR ModuleImpl@19823 OR ModuleImpl@25153 OR
 ModuleImpl@38896 OR ModuleImpl@115396 OR ModuleImpl@115397 OR
 ModuleImpl@19779 OR ModuleImpl@38904 OR ModuleImpl@115334 OR
 ModuleImpl@115335 OR ModuleImpl@115336 OR ModuleImpl@20574 OR
 ModuleImpl@38932 OR ModuleImpl@115892 OR ModuleImpl@115893 OR
 ModuleImpl@20504 OR ModuleImpl@38941 OR ModuleImpl@115871 OR
 ModuleImpl@115872 OR ModuleImpl@21083 OR ModuleImpl@38962 OR
 ModuleImpl@117081 OR ModuleImpl@117082 OR ModuleImpl@19884 OR
 ModuleImpl@38969 OR ModuleImpl@115449

Re: SolrCloud and external Zookeeper ensemble

2012-11-21 Thread Rafał Kuć
Hello!

As I told I wouldn't use the Zookeeper that is embedded into Solr, but
rather setup a standalone one. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 First of all: thank you for your answers. Yes, I meant side by side
 configuration. I think the worst case for ZKs here is to loose two of them.
 However, I'm going to use 4 availability zones in same region so at least
 this will reduce the risk of loosing both of them at the same time.
 Regards.

 On 21 November 2012 17:06, Rafał Kuć r@solr.pl wrote:

 Hello!

 Zookeeper by itself is not demanding, but if something happens to your
 nodes that have Solr on it, you'll loose ZooKeeper too if you have
 them installed side by side. However if you will have 4 Solr nodes and
 3 ZK instances you can get them running side by side.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Separate is generally nice because then you can restart Solr nodes
  without consideration for ZooKeeper.

  Performance-wise, I doubt it's a big deal either way.

  - Mark

  On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki mrzewu...@gmail.com
 wrote:

  Hi,
 
  I have 4 solr collections, 2-3mn documents per collection, up to 100K
  updates per collection daily (roughly). I'm going to create SolrCloud4x
 on
  Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
  what about zookeeper? It's going to be external ensemble, but is it
 better
  to use same nodes as solr or dedicated micro instances? Zookeeper does
 not
  seem to be resources demanding process, but what would be better in this
  case ? To keep it inside of solrcloud or separately (micro instances
 seem
  to be enough here) ?
 
  Thanks in advance.
  Regards.





Re: Additional field informations?

2012-11-20 Thread Rafał Kuć
Hello!

I think that's not possible out of the box in Solr. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hello all,
 We import xml documents to solr with solrj. We use xsl to proccess the
 objects to fields.
 We got the language informations in our objects.
 After xsl out Documents look like this:
 add
  doc
  ...
  field name=titlegerman title/title
  field name=titleenglish title/title
  field name=titlefrench title/title
  ...
  /doc
 /add


 Our schema.xml looks like this. (we use it as a filter too..)
 ...
 field name=title type=string indexed=true stored=true 
 multiValued=true /
 ...

 Our results look like this. (we want to transform it directly to html 
 with xsl)
 arr name=title
 strgerman title/str
 strenglish title/str
 strfrench title/str
 /arr

 Is there any possibillity to get a result like this:
 arr name=title
  str lang=degerman title/str
  str lang=enenglish title/str
  str lang=frgerman title/str
 /arr



Re: SOLR USING 100% percent CPU and not responding after a while

2012-11-20 Thread Rafał Kuć
Hello!

How about some information how Garbage Collector works during the time
when you have Solr unresponsive ?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hello,
 We were doing fine with memory, comparitively. This was in a one
 hour span of load testing we did. And no out of memory exceptions
 seen in the log. I have copied the memory chart here. It shows that
 most of the memory was being used for cache. 
 Thank you


 -Original Message-
 From: Rafał Kuć [mailto:r@solr.pl] 
 Sent: Tuesday, November 20, 2012 11:25 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOLR USING 100% percent CPU and not responding after a while

 Hello!

 The first thing one should ask is what is the cause of the 100% CPU
 utilization. It can be an issue with the memory or your queries may
 be quite resource intensive. Can you provide some more information
 about your issues ? Do you see any OutOfMemory exceptions in your
 logs ? How are JVM garbage collector behaving when you are
 experiencing Solr being unresponsive ?

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hello all,
 We have solr 3.6 in a tomcat server with 16 GB memory allocated to it 
 in a Linux server. It is a multicore solr and one of the core has over 
 100 million records. We have a web application that is fed by this 
 solr. We don't use sharding, different cores are for different 
 purposes. There are all kind of queries involved: simple queries, 
 queries with filters, queries with facets, queries with pagination ( 
 deep pagination as well) and a combination of them . When there are
 100 concurrent users and quite some queries thrown to the solr the CPU 
 utilization by the solr exceeds 100% and it becomes unresponsive.
 Has someone from this group have had  this problem and found a 
 solution. We did look at the queries from the logs and there is some 
 optimization to be done there but is there anything we can do in the 
 configuration. I know that there is a parameter called MergeFactor 
 which can be set low in the solrconfig.xml for enabling faster 
 searching but I don't see anything about CPU utilization related to 
 that settings. So my question in a jest is that is there any CPU 
 utilization related configuration settings for SOLR.
 Any help will be appreciated a lot
 Thank you a bunch
 Biva





Re: Handle Queries which return 1000s of records

2012-11-12 Thread Rafał Kuć
Hello!

By pieces you mean by paging the results ? If yes, please look at
http://wiki.apache.org/solr/CommonQueryParameters - start and rows
parameters.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi,

 I am integrating solr search on my website. I am having some keywords that
 would return 1000s of records. Is there any way in solr by which I can get
 results in pieces.

 Pratyul



Re: How to insert documents into differenet indexes

2012-11-11 Thread Rafał Kuć
Hello!

Just use the update handler that is specific to a given core. For
example if you have two cores named core1 and core2, you should use
the following addresses (if you didn't change the default
configuration):

/solr/core1/update/

and

/solr/core2/update/

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi, 

 I 've set up a Solr instance with multiple cores to be able to use
 different indexes for different applications. The point I'm struggling
 with is how do I insert documents into the index running on a specific
 core? Any clue appreciated.

 best




Re: Sorting by boolean function

2012-11-07 Thread Rafał Kuć
Hello!

From looking at your sort parameter it seems that you are missing the
order of sorting. The function query will return a value on which
Solr will sort and Solr needs to know if the documents should be
sorted in descending or ascending order. Try something like that:

sort=if(exists(start_time:[* TO 1721] AND end_time:[1721 TO
*]),100,0)+desc

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi All,

 I am trying to sort by a boolean function to sort the places that are open
 and not open as given below

 sort=if(exists(start_time:[* TO 1721] AND end_time:[1721 TO *]),100,0)

 However I keep getting an error message saying
 Can't determine a Sort Order (asc or desc) in sort spec
 'if(exists(start_time:[* TO 1721] AND end_time:[1721 TO *]),100,0) asc',
 pos=23

 I assume that boolean functions can be used to sort. Is there anything
 missing from the sort function ?

 Thanks,
 Indika



Re: Sorting by boolean function

2012-11-07 Thread Rafał Kuć
Hello!

Ahhh, sorry. I see now. The function query you are specifying for the
sort is not proper. Solr doesn't understand that part:

exists(start_time:[* TO 1721] AND end_time:[1721 TO *])

try something like that:

sort=if(exists(query($innerQuery)),100,0)+descinnerQuery={!edismax}(start_time:[*+TO+1721]
AND end_time:[1721+TO+*])

However remember that exists function is only Solr 4.0.

I don't have any example data with me right now, so I can't
say if it will work for sure, but clean Solr 4.0 doesn't cry about
errors now :) 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Thanks for the response, I did append the sort order but the result was the
 same,

 Can't determine a Sort Order (asc or desc) in sort spec
 'if(exists(start_time:[* TO 1721] AND end_time:[1721 TO *]),100,0) asc',
 pos=23

 Thanks,
 Indika



 On 8 November 2012 04:02, Rafał Kuć r@solr.pl wrote:

 Hello!

 From looking at your sort parameter it seems that you are missing the
 order of sorting. The function query will return a value on which
 Solr will sort and Solr needs to know if the documents should be
 sorted in descending or ascending order. Try something like that:

 sort=if(exists(start_time:[* TO 1721] AND end_time:[1721 TO
 *]),100,0)+desc

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi All,

  I am trying to sort by a boolean function to sort the places that are
 open
  and not open as given below

  sort=if(exists(start_time:[* TO 1721] AND end_time:[1721 TO *]),100,0)

  However I keep getting an error message saying
  Can't determine a Sort Order (asc or desc) in sort spec
  'if(exists(start_time:[* TO 1721] AND end_time:[1721 TO *]),100,0) asc',
  pos=23

  I assume that boolean functions can be used to sort. Is there anything
  missing from the sort function ?

  Thanks,
  Indika





Re: Retrieve unique documents on a non id field

2012-11-07 Thread Rafał Kuć
Hello!

Look at the field collapsing functionality - 
http://wiki.apache.org/solr/FieldCollapsing

It allows you to group documents based on field value, query or
function query.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi All,

 Currently I am using Solr for searching and filtering restaurants based on
 certain criteria. For example I use Solr to obtain the list of restaurants
 open in the day.

 A restaurant can have sessions when its open, e.g. Breakfast, Lunch and
 Dinner, and the time information related to these sessions are stored in
 three different documents with a field to identify the restaurant
 (restaurant_id).

 When I query for restaurants that are open at 11:00 AM using (start_time:[*
 TO 1100] AND end_time:[1100 TO *]) and if sessions overlap (say Breakfast
 and Lunch) I would get both the Breakfast document and the Lunch document.
 These are in fact two different documents and would have the same
 restaurant_id.

 My question is, is there a way to retrive only one document if the
 restaurant_id repeated in the response.

 Thanks,
 Indika



Re: SOLR 3.5 sometimes throws java.lang.NumberFormatException: For input string: java.math.BigDecimal:1848.66

2012-10-31 Thread Rafał Kuć
Hello!

Look at what Solr returns in the error - you send the following value
java.math.BigDecimal:1848.66 - remove the java.math.BigDecimal:
and your problem should be gone.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Welcome all,

 We have a very strange problem with SOLR 3.5. It SOMETIMES throws exceptions:

 2012-10-31 10:20:06,408 SEVERE [org.apache.solr.core.SolrCore:185]
 (http-10.205.49.74-8080-155) org.apache.solr.common.SolrException:
 ERROR: [doc=MyDoc # d3mo1351674222122-1 # 2012-10-31 08:03:42.122]
 Error adding field 'AMOUNT'='java.math.BigDecimal:1848.66'
 at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324)
 at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
 at
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:53)
 at
 my.package.solr.dihutils.DateCheckUpdateProcessorFactory$DateCheckUpdateProcessor.processAdd(DateCheckUpdateProcessorFactory.java:91)
 at
 org.apache.solr.handler.BinaryUpdateRequestHandler$2.document(BinaryUpdateRequestHandler.java:79)
 at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
 at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:129)
 at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:211)
 at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readNamedList(JavaBinUpdateRequestCodec.java:114)
 at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:176)
 at
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:102)
 at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:144)
 at
 org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:69)
 at
 org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:45)
 at
 org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:56)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:190)
 at
 org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92)
 at
 org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126)
 at
 org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:598)
 at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NumberFormatException: For input string:
 java.math.BigDecimal:1848.66
 at
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
 at java.lang.Double.parseDouble(Double.java:510)
 at
 org.apache.solr.schema.TrieField.createField(TrieField.java:418)
 at
 org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
 at
 org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203

Re: facet prefix with tokenized fields

2012-10-29 Thread Rafał Kuć
Hello!

Do you have to use faceting for prefixing ? Maybe it would be better to use 
ngram based field and return the stored value ?


-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  
 Hi.

 Is there any solution to facet documents with specified prefix on
 some tokenized field, but in result gets the original value of a field?




 e.q.: 

 field name=category_ac type=text_autocomplete indexed=true
 stored=true multiValued=true /


 fieldType name=text_autocomplete class=solr.TextField
 analyzer
 tokenizer
 class=solr.StandardTokenizerFactory /
 filter class=solr.LowerCaseFilterFactory /
 /analyzer
 /fieldType







 Indexed value: toys for children

 query:
 q=start=0rows=0facet.limit=-1facet.mincount=1f.category_ac.facet.prefix=chifacet.field=category_acfacet=true




 I'd like to get exacly toys for children, not children




 --

 Grzegorz Sobczyk





Re: select one document with a given value for an attribute in a single page

2012-10-29 Thread Rafał Kuć
Hello!

Try the grouping feature of Solr -
http://wiki.apache.org/solr/FieldCollapsing . You can collapse
documents based on the field value, which would be a shop identifier
in your case.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 I have the Hi all,

 I have the following schema

 offer:
 - offer_id
 - offer_title
 - offer_description
 - related_shop_id
 - related shop_name
 - offer_price

 Each offer is related to a shop.
 In one shop, we have many offers

 I would like show in one page (26 offers) only one offer from a shop.

 I need help to implement this feature. 


 Thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/select-one-document-with-a-given-value-for-an-attribute-in-a-single-page-tp4016679.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Having problem runing apache-solr on my linux server

2012-10-29 Thread Rafał Kuć
Hello!

What you are experiencing is normal - when you run an application in
the foreground it will close when you close the terminal. You can use
command like screen or you can use nohup. However I would advise
installing a standalone Jetty or Tomcat and just deploy Solr as any
other web application.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch


 hello dear,
 I try running apache-solr 3.6.1 on my linux host using java -jar
 start.jar, everything is running fine.
 but java stop running as soon as the command terminal close. i have
 searching for means of making it run continuously but i did not see.

 please i need help on making apache-solr running without stopping at the 
 terminal close.

 thanks.
 Zakari M



Re: Is it possible to use something like sum() in a solr-query?

2012-10-29 Thread Rafał Kuć
Hello!

Take a look at StatsComponent - http://wiki.apache.org/solr/StatsComponent

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 I have for example an integer field and want to sum all these values for
 all the matching documents.

 Similar to this in sql:

 SELECT SUM(/expression/ )
 FROM tables
 WHERE predicates;


 Regards,
 Markus

 On 29.10.2012 22:25, Jack Krupansky wrote:
 Unfortunately, neither the subject nor your message says it all. Be 
 specific - what exactly do you want to sum? All matching docs? Just 
 the returned docs? By group? Or... what?

 You can of course develop your own search component that does whatever 
 it wants with the search results.

 -- Jack Krupansky

 -Original Message- From: Markus.Mirsberger
 Sent: Monday, October 29, 2012 11:08 AM
 To: solr-user@lucene.apache.org
 Subject: Is it possible to use something like sum() in a solr-query?

 Hi,

 the subject says it all :)
 Is there something like sum() available in a solr query to sum all
 values of a field ?

 Regards,
 Markus Mirsberger




Re: Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Rafał Kuć
Hello!

You try to put the HTML into the XML sent to Solr right ? You should
use the proper UTF-8 encoding to do that. For example look at the
utf8-example.xml file from the exampledocs directory that comes with
Solr and you'll see something like this:

field name=featurestag with escaped chars: lt;nicetag/gt;/field

As you can see the  and  are properly encoded as lt; and gt;

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi,

 I am using Solr 4.0.0. I have a HTML content as description of a product.
 If I index it without any filtering it is giving errors on search.
 How can I filter an HTML content.

 Pratyul



Re: Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Rafał Kuć
Hello!

You don't need a custom update request processor - there is a char
filter dedicated to strip HTML tags from your content and index only
relevant parts of it - 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

However, you first need to properly send it to Solr for indexing. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 I think you will have to write an UpdateProcessor to strip out html tags.

 http://wiki.apache.org/solr/UpdateRequestProcessor

 As per Solr 4.0 you can also use scripting languages like Python, Ruby and
 Javascript to write scripts for use as updateprocessors too.

 -Mensagem Original- 
 From: Pratyul Kapoor
 Sent: Friday, October 26, 2012 3:56 AM
 To: solr-user@lucene.apache.org
 Subject: Filtering HTML content in Solr 4.0.0

 Hi,

 I am using Solr 4.0.0. I have a HTML content as description of a product.
 If I index it without any filtering it is giving errors on search.
 How can I filter an HTML content.

 Pratyul 



Re: Occasional Solr performance issues

2012-10-22 Thread Rafał Kuć
Hello!

You can check if the long warming is causing the overlapping
searchers. Check Solr admin panel and look at cache statistics, there
should be warmupTime property.

Lowering the autowarmCount should lower the time needed to warm up,
howere you can also look at your warming queries (if you have such)
and see how long they take.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 When Solr is slow, I'm seeing these in the logs:
 [collection1] Error opening new searcher. exceeded limit of
 maxWarmingSearchers=2,​ try again later.
 [collection1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2

 Googling, I found this in the FAQ:
 Typically the way to avoid this error is to either reduce the
 frequency of commits, or reduce the amount of warming a searcher does
 while it's on deck (by reducing the work in newSearcher listeners,
 and/or reducing the autowarmCount on your caches)
 http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F

 I happen to know that the script will try to commit once every 60
 seconds. How does one reduce the work in newSearcher listeners? What
 effect will this have? What effect will reducing the autowarmCount on
 caches have?

 Thanks.



Re: Error: _version_field must exist in schema

2012-10-18 Thread Rafał Kuć
Hello!

Look at your solrconfig.xml file, you should see something like that:

updateLog
 str name=dir${solr.data.dir:}/str
/updateLog

Just remove it and Solr shouldn't bother you with the version field
information. However remember that some features won't work (like the
real time get or partial documents update).

You can also add _version_ field to your schema and forget about it.
You don't need to do anything with it as it is used internally by
Solr. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 On Thu, Oct 18, 2012 at 12:25 AM, Rafał Kuć r@solr.pl wrote:
 Hello!

 You can some find information about requirements of SolrCloud at
 http://wiki.apache.org/solr/SolrCloud . I don't know if _version_ is
 mentioned elsewhere.

 As for Websolr - I'm afraid I can't say anything about the cause of
 those errors without seeing the exception.


 I see, thanks. I don't think that I'm using the SolrCloud feature. Is
 it enable because there exist solr/collection1 and also
 multicore/core0?



Re: What does _version_ field used for?

2012-10-17 Thread Rafał Kuć
Hello!

It is used internally by Solr, for example by features like partial
update functionality and update log.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 I ma moving to solr4.0 from beta version. There is a exception was thrown,

 Caused by: org.apache.solr.common.SolrException: _version_field must exist
 in schema, using indexed=true stored=true and multiValued=false
 (_version_ does not exist)
 at
 org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:606)
 ... 26 more
 2

 It's seem that there need a field like
  field name=_version_ type=long indexed=true stored=true/
 in schema.xml. I am wonder what does this used for?



  1   2   >