Re: PathHierarchyTokenizerFactory storage for pivot facets

2014-01-06 Thread Ahmet Arslan
Hi Stephan,

If you want to populate some fields from Path info, 
UpdateRequestProcessorFactory is a better fit. You can use 
URLClassifyProcessor(Factory).java as an example.

Ahmet


On Monday, January 6, 2014 7:46 AM, Stephan Schubert 
m...@stephan-schubert.com wrote:
I want to store the levels of an path/url in seperate fields to make use
of pivot faceting. I thought about using the
PathHierarchyTokenizerFactory for that. But how can I store the several
levels of an url in seperate fields?

Example:

Doc1 - Path: a/b/c/d
Doc2 - Path: f/g/h/i

Document 1 should store the value of a in a field something like
urllevel1, b in field
urllevel2, c in urllevel 3 and so on.
The same for document 2 like f in field urllevel1, g in urllevel2  and h
in urllevel3.

Is the PathHierarchyTokenizerFactory the right approach for that? I know
the PathHierarchyTokenizerFactory splits the path up, but I don't know
how I can store the several levels in the specific fields and set it up
in the schema.xml.


Slowness of Solr search during the replication

2014-01-06 Thread sivaprasad
Hi,

I have configured Solr salve replication for every 1hr. During this time I
am seeing my search is unresponsive. Any other architectural changes do we
need to do to overcome this?

I am giving some cache details which I have in my solrconfig.xml.

 filterCache class=solr.FastLRUCache
 size=512
 initialSize=512 cleanupThread=true 
 autowarmCount=0/



queryResultCache class=solr.LRUCache
 size=512
 initialSize=512
 autowarmCount=0/



  documentCache class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=0/

   fieldValueCache class=solr.FastLRUCache
size=512 cleanupThread=true 
autowarmCount=128
showItems=32 /

useFilterForSortedQuerytrue/useFilterForSortedQuery
 useColdSearcherfalse/useColdSearcher
maxWarmingSearchers2/maxWarmingSearchers

Any suggestions are appreciable.

Regards,
Siva http://smarttechie.org/  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slowness-of-Solr-search-during-the-replication-tp4109712.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing MYSQL DATETIME field in solr as String

2014-01-06 Thread manju16832003
I found a way to store MySQL DateTime as a string in Solr

Here is the way

in data-config.xml in the SQL query we could convert the date directly to
char

CAST(l.creation_date as char) as creation_date,
CAST(l.modification_date as char) as modification_date,

in schema.xml
field name=creation_date type=string indexed=true stored=true 
multiValued=false default= /
field name=modification_date type=string indexed=true stored=true 
multiValued=false default= /


Output would be 
 str name=creation_date2013-11-13 10:26:32/str
 str name=modification_date2013-11-13 10:26:32/str

This is exactly what I was looking for.

If you guys have any other wise please free to share. :-).

Happy Solr!!!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-MYSQL-DATETIME-field-in-solr-as-String-tp4106836p4109720.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exact match on KeywordTokenizer

2014-01-06 Thread André Maldonado
Hi Chris, thanks for your reply and sorry for my poor explained question.

Here are some examples of indexed data (fieldname:propertyType):

Apartamento Padrão
Casa Padrão
Loft
Terreno

And some examples of the queries

propertyType:Apartamento Padrão
propertyType:apartamento-padrao
propertyType:Loft
propertyType:loft

Using the analysis menu, I can see that the difference is in the double
quotes I'm providing when I search and that are not indexed.

How can I solve this?

Thanks



*--*
*E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

 *andre.maldonado*@gmail.com andre.maldon...@gmail.com
 (11) 9112-4227

http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.facebook.com/profile.php?id=10659376883
http://twitter.com/andremaldonado
  https://profiles.google.com/105605760943701739931
http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
  http://www.youtube.com/andremaldonado



On Fri, Jan 3, 2014 at 9:15 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 Can you show us examples of the types of data you are indexing, and the
 types of queries you want to match? (as well as examples of queries you
 *don't* want to match)


 https://wiki.apache.org/solr/UsingMailingLists#Information_useful_for_searching_problems

 Best guess, based on your problem description, is that you are indexing
 text like Foo Bar and then searching for things like foOBaR and
 you want those to match.

 With your analyzer as it is, you will never get a match unless the client
 sending hte query string has already lowercased it, done any asciifolding
 needed, and always sends - instead of space characters.

 i suspect what you really want is to have index  query analyzers that are
 the same (or at least better matches for exachother then what you have
 below)...


 : Hi,
 :
 : Is there a way to do an exact match search on a tokenized field?
 :
 : I have a scenario which i need a field to be indexed and searchable
 : regardless of the case or white spaces used. For this, I created a custom
 : field type with the following configuration:
 :
 : field name=propertyType type=customtype indexed=true
 stored=true /
 :
 : fieldType name=customtype class=solr.TextField
 : positionIncrementGap=100
 : analyzer type=index
 : tokenizer class=solr.KeywordTokenizerFactory/
 : filter class=solr.ASCIIFoldingFilterFactory/
 : filter class=solr.LowerCaseFilterFactory/
 : filter class=solr.PatternReplaceFilterFactory pattern= 
 : replacement=-/
 :   /analyzer
 :   analyzer type=query
 : tokenizer class=solr.KeywordTokenizerFactory/
 :   /analyzer
 : /fieldType
 :
 : Even using KeywordTokenizerFactory on both index and query, all my
 searchs
 : based on exact match stopped working.
 :
 : Is there a way to search exact match like a string field and at the same
 : time use customs tokenizers aplied to that field?
 :
 : Thank's in advance
 :
 :
 :
 *--*
 : *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*
 :
 :  *andre.maldonado*@gmail.com andre.maldon...@gmail.com
 :  (11) 9112-4227
 :
 : http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
 : http://www.facebook.com/profile.php?id=10659376883
 : http://twitter.com/andremaldonado
 :   https://profiles.google.com/105605760943701739931
 : http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
 :   http://www.youtube.com/andremaldonado
 :

 -Hoss
 http://www.lucidworks.com/


Re: Solr -The connection has timed out

2014-01-06 Thread Cihat güzel
Browse to the following URL:
http://docs.lucidworks.com/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-IndexLocks



2013/12/31 Furkan KAMACI furkankam...@gmail.com

 Hi;

 Beside the other error lines did you realize that log lines:

 *java.net.BindException: Address already in use*

 Could you check that is there any other application that is using 8983
 port?

 Thanks;
 Furkan KAMACI


 2013/12/31 rakesh rakesh3...@yahoo.com

  Finally able to get the full log details
 
  ERROR - 2013-12-30 15:13:00.811; org.apache.solr.core.SolrCore;
  [collection1] Solr index directory
 
 
 '/ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data/index/'
  is locked.  Throwing exception
  INFO  - 2013-12-30 15:13:00.812; org.apache.solr.core.SolrCore;
  [collection1]  CLOSING SolrCore org.apache.solr.core.SolrCore@de26e52
  INFO  - 2013-12-30 15:13:00.812; org.apache.solr.update.SolrCoreState;
  Closing SolrCoreState
  INFO  - 2013-12-30 15:13:00.813;
  org.apache.solr.update.DefaultSolrCoreState; SolrCoreState ref count has
  reached 0 - closing IndexWriter
  INFO  - 2013-12-30 15:13:00.813; org.apache.solr.core.SolrCore;
  [collection1] Closing main searcher on request.
  INFO  - 2013-12-30 15:13:00.814;
  org.apache.solr.core.CachingDirectoryFactory; Closing
  NRTCachingDirectoryFactory - 2 directories currently being tracked
  INFO  - 2013-12-30 15:13:00.814;
  org.apache.solr.core.CachingDirectoryFactory; looking to close
  /ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data/index
 
 
 [CachedDirrefCount=0;path=/ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data/index;done=false]
  INFO  - 2013-12-30 15:13:00.814;
  org.apache.solr.core.CachingDirectoryFactory; Closing directory:
  /ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data/index
  INFO  - 2013-12-30 15:13:00.815;
  org.apache.solr.core.CachingDirectoryFactory; looking to close
  /ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data
 
 
 [CachedDirrefCount=0;path=/ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data;done=false]
  INFO  - 2013-12-30 15:13:00.815;
  org.apache.solr.core.CachingDirectoryFactory; Closing directory:
  /ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data
  ERROR - 2013-12-30 15:13:00.817; org.apache.solr.core.CoreContainer;
 Unable
  to create core: collection1
  org.apache.solr.common.SolrException: Index locked for write for core
  collection1
  at org.apache.solr.core.SolrCore.init(SolrCore.java:834)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:625)
  at
 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:557)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:592)
  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:271)
  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
  Caused by: org.apache.lucene.store.LockObtainFailedException: Index
 locked
  for write for core collection1
  at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:491)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:755)
  ... 13 more
  ERROR - 2013-12-30 15:13:00.819; org.apache.solr.common.SolrException;
  null:org.apache.solr.common.SolrException: Unable to create core:
  collection1
  at
  org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:977)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:601)
  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:271)
  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
  Caused by: org.apache.solr.common.SolrException: Index locked for write
 for
  core collection1
  at org.apache.solr.core.SolrCore.init(SolrCore.java:834)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:625)
  at
 
 

Re: Slowness of Solr search during the replication

2014-01-06 Thread Toke Eskildsen
On Mon, 2014-01-06 at 09:18 +0100, sivaprasad wrote:
 I have configured Solr salve replication for every 1hr. During this time I
 am seeing my search is unresponsive.

Unresponsive as in waiting for the updated searcher to be ready or as in
very slow while the replication is ongoing?

- Toke Eskildsen




monitoring solr system

2014-01-06 Thread elmerfudd
hi,
we have a cluster consisting of 6 servers. 3 leaders and 3 replicas.
The system must be alive and working 24X7. We would like to monitor the
system for any troubles or problems that may occur and will demand our
immediate support.
Currently we are monitoring the  servers, the zookeeper and the jetty
processes, and query keep-alive.

Are there any other monitoring you would recommend us?
Are there any important log messages  we should pay attention to?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/monitoring-solr-system-tp4109730.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slowness of Solr search during the replication

2014-01-06 Thread Erick Erickson
The first thing I'd try would be to up the autowarm counts. Don't go
overboard here, I'd suggest, say, 16 of so to start but it depends on your
query mix.

If that doesn't help, you need to add some more details. Some example
queries would be a place to start.

Best
Erick
On Jan 6, 2014 4:19 AM, sivaprasad sivaprasa...@echidnainc.com wrote:

 Hi,

 I have configured Solr salve replication for every 1hr. During this time I
 am seeing my search is unresponsive. Any other architectural changes do we
 need to do to overcome this?

 I am giving some cache details which I have in my solrconfig.xml.

  filterCache class=solr.FastLRUCache
  size=512
  initialSize=512 cleanupThread=true
  autowarmCount=0/



 queryResultCache class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/



   documentCache class=solr.LRUCache
size=512
initialSize=512
autowarmCount=0/

fieldValueCache class=solr.FastLRUCache
 size=512 cleanupThread=true
 autowarmCount=128
 showItems=32 /

 useFilterForSortedQuerytrue/useFilterForSortedQuery
  useColdSearcherfalse/useColdSearcher
 maxWarmingSearchers2/maxWarmingSearchers

 Any suggestions are appreciable.

 Regards,
 Siva http://smarttechie.org/




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Slowness-of-Solr-search-during-the-replication-tp4109712.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: monitoring solr system

2014-01-06 Thread Furkan KAMACI
Hi;

You can check here: http://sematext.com/spm/

Thanks;
Furkan KAMACI


2014/1/6 elmerfudd na...@012.net.il

 hi,
 we have a cluster consisting of 6 servers. 3 leaders and 3 replicas.
 The system must be alive and working 24X7. We would like to monitor the
 system for any troubles or problems that may occur and will demand our
 immediate support.
 Currently we are monitoring the  servers, the zookeeper and the jetty
 processes, and query keep-alive.

 Are there any other monitoring you would recommend us?
 Are there any important log messages  we should pay attention to?

 Thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/monitoring-solr-system-tp4109730.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Slowness of Solr search during the replication

2014-01-06 Thread sivaprasad
Do we need to set the autowarmCount on Slave or master? As per the Solr WIKI,
I found the below information.

Solr4.0 autowarmCount can now be specified as a percentage (ie: 90%) which
will be evaluated relative to the number of items in the existing cache. 
This can be an advantageous setting in an instance of Solr where you don't
expect any search traffic (ie a master),
 but you want some caches so that if it does take on traffic it won't be too
overloaded. Once the traffic dies down, subsequent commits will gradually
decrease the number of items being warmed.

Regards,
Siva



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slowness-of-Solr-search-during-the-replication-tp4109712p4109739.html
Sent from the Solr - User mailing list archive at Nabble.com.


Branch/Java questions re: contributing code

2014-01-06 Thread Ryan Cutter
1. Should we be using Java 6 or 7?  The docs say 1.6 (
http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on
trunk/ yields:

/lucene/common-build.xml:328: Minimum supported Java version is 1.7.

I don't get that error with branch_4x/ which leads to my next question.

2. Should work toward 4.X be done on trunk/ or branch_4x/?  It sounds like
patches should be based on trunk then it gets ported as necessary.

Thanks! Ryan


Error from SPLITSHARD that seems to be unrecoverable

2014-01-06 Thread cwhit
I'm using Solr 4.6 with SolrCloud.  I tried using the SPLITSHARD command and
it threw a series of exceptions, which have put my SolrCloud in a weird
state.  

Here is an image of my SolrCloud setup after a few tries at SPLITSHARD, all
of which fail. http://imgur.com/CFXJKfb  

Here is the log output. http://pastebin.com/7uC5PQsa  

The notable exception claims that it can't read stopwords.txt, but the file
is absolutely present locally at solr/conf/stopwords.txt, and it's present
in zookeeper at /configs/config1/stopwords.txt (I checked with zkCli.cmd). 
Here is the notable exception stack trace:

ERROR - 2013-12-20 20:18:24.231; org.apache.solr.core.CoreContainer; Unable
to create core: collection1_shard3_1_replica1 
java.lang.RuntimeException: java.io.IOException: Error opening
/configs/config1/stopwords.txt 
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169) 
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) 
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
 
at
org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:254) 
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:590) 
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:498)
 
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
 
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
 
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
 
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) 
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) 
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) 
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:368) 
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) 
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) 
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Unknown Source) 
Caused by: java.io.IOException: Error opening /configs/config1/stopwords.txt 
at
org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:83)
 
at
org.apache.lucene.analysis.util.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:255)
 
at
org.apache.lucene.analysis.util.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:243)
 
at
org.apache.lucene.analysis.core.StopFilterFactory.inform(StopFilterFactory.java:99)
 
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:655) 
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:167) 
... 35 more 


I have 2 questions that stem from this: Why is it happening and how can I
solve it?  It seems to be having trouble locating the config 

Re: Branch/Java questions re: contributing code

2014-01-06 Thread Shalin Shekhar Mangar
On Mon, Jan 6, 2014 at 8:54 PM, Ryan Cutter ryancut...@gmail.com wrote:
 1. Should we be using Java 6 or 7?  The docs say 1.6 (
 http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on
 trunk/ yields:

 /lucene/common-build.xml:328: Minimum supported Java version is 1.7.

 I don't get that error with branch_4x/ which leads to my next question.

branch_4x is on Java 6 and trunk is on Java 7.


 2. Should work toward 4.X be done on trunk/ or branch_4x/?  It sounds like
 patches should be based on trunk then it gets ported as necessary.

 Thanks! Ryan

Yeah, you are right. Features are committed to trunk first and
backported to branch_4x

-- 
Regards,
Shalin Shekhar Mangar.


RE: Branch/Java questions re: contributing code

2014-01-06 Thread Markus Jelsma
Trunk (5.x)  requires Java 1.7, 4.x still works with 1.6. Check the 
CHANGES.txt, you'll see it near the top.

 
 
-Original message-
 From:Ryan Cutter ryancut...@gmail.com
 Sent: Monday 6th January 2014 16:27
 To: solr-user@lucene.apache.org
 Subject: Branch/Java questions re: contributing code
 
 1. Should we be using Java 6 or 7?  The docs say 1.6 (
 http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on
 trunk/ yields:
 
 /lucene/common-build.xml:328: Minimum supported Java version is 1.7.
 
 I don't get that error with branch_4x/ which leads to my next question.
 
 2. Should work toward 4.X be done on trunk/ or branch_4x/?  It sounds like
 patches should be based on trunk then it gets ported as necessary.
 
 Thanks! Ryan
 


Re: need help on OpenNLP with Solr

2014-01-06 Thread rashi gandhi
Hi,

Also i wanted know,
Is it possible to integrate wordnet with this analyzer?
I want to use wordnet as synonym expansion along with OpenNLP filters.
What are the changes required in solr schema.xml and solrconfig.xml?

Thanks in Advance


On Mon, Jan 6, 2014 at 9:37 PM, rashi gandhi gandhirash...@gmail.comwrote:

 Hi,



 I have applied OpenNLP (LUCENE 2899.patch) patch to SOLR-4.5.1 for nlp
 searching and it is working fine.

 Also I have designed an analyzer for this:

 fieldType name=nlp_type class=solr.TextField
 positionIncrementGap=100

   analyzer type=index

 tokenizer class=solr.OpenNLPTokenizerFactory
 sentenceModel=opennlp/en-test-sent.bin
tokenizerModel=opennlp/en-test-tokenizer.bin/

 filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt enablePositionIncrements=true/

 filter class=solr.OpenNLPFilterFactory
 posTaggerModel=opennlp/en-pos-maxent.bin/

 filter class=solr.OpenNLPFilterFactory
 nerTaggerModels=opennlp/en-ner-person.bin/

 filter class=solr.OpenNLPFilterFactory
 nerTaggerModels=opennlp/en-ner-location.bin/

 filter
 class=solr.LowerCaseFilterFactory/

 filter
 class=solr.SnowballPorterFilterFactory/

/analyzer

analyzer type=query

 tokenizer class=solr.OpenNLPTokenizerFactory
 sentenceModel=opennlp/en-test-sent.bin tokenizerModel
 =opennlp/en-test-tokenizer.bin/

 filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt enablePositionIncrements=true/

 filter class=solr.OpenNLPFilterFactory
 posTaggerModel=opennlp/en-pos-maxent.bin/

 filter class=solr.OpenNLPFilterFactory
 nerTaggerModels=opennlp/en-ner-person.bin/

 filter class=solr.OpenNLPFilterFactory
 nerTaggerModels=opennlp/en-ner-location.bin/

 filter
 class=solr.LowerCaseFilterFactory/

 filter
 class=solr.SnowballPorterFilterFactory/

/analyzer

 /fieldType


 I am able to find that posTaggerModel is performing tagging in the phrases
 and add the payloads. ( but iam not able to analyze it)

 My Question is:
 Can i search a phrase giving high boost to NOUN then VERB ?
 For example: if iam searching sitting on blanket , so i want to give
 high boost to NOUN term first then VERB, that are tagged by OpenNLP.
 How can i use payloads for boosting?
 What are the changes required in schema.xml?

 Please provide me some pointers to move ahead

 Thanks in advance






need help on OpenNLP with Solr

2014-01-06 Thread rashi gandhi
Hi,



I have applied OpenNLP (LUCENE 2899.patch) patch to SOLR-4.5.1 for nlp
searching and it is working fine.

Also I have designed an analyzer for this:

fieldType name=nlp_type class=solr.TextField
positionIncrementGap=100

  analyzer type=index

tokenizer class=solr.OpenNLPTokenizerFactory
sentenceModel=opennlp/en-test-sent.bin
   tokenizerModel=opennlp/en-test-tokenizer.bin/

filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt enablePositionIncrements=true/

filter class=solr.OpenNLPFilterFactory
posTaggerModel=opennlp/en-pos-maxent.bin/

filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-person.bin/

filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-location.bin/

filter
class=solr.LowerCaseFilterFactory/

filter
class=solr.SnowballPorterFilterFactory/

   /analyzer

   analyzer type=query

tokenizer class=solr.OpenNLPTokenizerFactory
sentenceModel=opennlp/en-test-sent.bin tokenizerModel
=opennlp/en-test-tokenizer.bin/

filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt enablePositionIncrements=true/

filter class=solr.OpenNLPFilterFactory
posTaggerModel=opennlp/en-pos-maxent.bin/

filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-person.bin/

filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-location.bin/

filter
class=solr.LowerCaseFilterFactory/

filter
class=solr.SnowballPorterFilterFactory/

   /analyzer

/fieldType


I am able to find that posTaggerModel is performing tagging in the phrases
and add the payloads. ( but iam not able to analyze it)

My Question is:
Can i search a phrase giving high boost to NOUN then VERB ?
For example: if iam searching sitting on blanket , so i want to give high
boost to NOUN term first then VERB, that are tagged by OpenNLP.
How can i use payloads for boosting?
What are the changes required in schema.xml?

Please provide me some pointers to move ahead

Thanks in advance


Index for csv-file created successfully, but no data is shown

2014-01-06 Thread Huynh, Chi-Hao
Dear solr users,

I would appreciate if someone can help me out here. My goal is to index a 
csv-file.

First of all, I am using the CDH 5 beta distribution of Hadoop, which includes 
solr 4.4.0, on a single node. I am following the hue tutorial to index and 
search the data from the yelp dataset challenge 
http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7-how-to-index-and-search.

Following the tutorial, I have uploaded the config files, including the 
prepared schema.xml, to zookeeper via the solrctl-command:
solrctl instancedir --create reviews [path to conf]

After this, I have created the collection via:
solrctl collection --create reviews -s 1

This works fine, as I can see the collection created in the Solr Admin Web UI 
and the instancedir in the zookeeper shell.

Then, using the MapReduceIndexerTool and the provided morphline file the index 
is created and uploaded to solr. According to the command output the index was 
created successfully:

1481 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Indexing 1 
files using 1 real mappers into 1 reducers
52716 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Done. 
Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs
52774 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of output 
shards into Solr cluster...
52829 [pool-4-thread-1] INFO  org.apache.solr.hadoop.GoLive  - Live merge 
hdfs://svr-hdp01:8020/tmp/load/results/part-0 into 
http://SVR-HDP01:8983/solr
53017 [pool-4-thread-1] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  
- Creating new http client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
53495 [main] INFO  org.apache.solr.hadoop.GoLive  - Committing live merge...
53496 [main] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  - Creating 
new http client, config:
53512 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  - Waiting 
for client to connect to ZooKeeper
53513 [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  
- Watcher org.apache.solr.common.cloud.ConnectionManager@19014023 
name:ZooKeeperConnection Watcher:SVR-HDP01:2181/solr got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
53513 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  - Client is 
connected to ZooKeeper
53514 [main] INFO  org.apache.solr.common.cloud.ZkStateReader  - Updating 
cluster state from ZooKeeper...
53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Done committing live merge
53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of index 
shards into Solr cluster took 0.878 secs
53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging completed 
successfully
53652 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Succeeded 
with job: jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, 
jobId: job_1388405934175_0013
53653 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Success. 
Done. Program took 53.719 secs. Goodbye.

Now, when I go to the web UI and select the created core, I find the core to be 
empty, there are 0 number of Docs and querying it bears no result. My question 
is, if I have to upload the csv-file manually to somewhere on the solr server 
as it seems as if the csv-file was parsed and indexed successfully, but the 
data is missing that was indexed.

I hope, the description of the problem was clear enough. Thanks a lot!
Kind regards

__
initions AG
Chi-Hao Huynh
Weidestraße 120a
D-22081 Hamburg

t:   +49 (0) 40 / 41 49 60-62
f:   +49 (0) 40 / 41 49 60-11
e:  hu...@initios.commailto:hu...@initios.com
w: www.initions.comhttp://www.initions.com
Vollständiger Name der Gesellschaft: initions innovative IT solutions AG
Sitz der Gesellschaft: Hamburg
Handelsregister Hamburg B 83929
Aufsichtsratsvorsitzender: Dr. Michael Leue
Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn



Re: How to boost documents ?

2014-01-06 Thread Anca Kopetz

Hi,

I tried to isolate the problem, so I tested the following query on solr-4.6.0 :

http://localhost:8983/solr/collection1/select?q=ipod 
belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power

The error is :
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
'power'

And the stacktrace :

ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
Infinite Recursion detected parsing query 'power'
   at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
   at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
   at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
   at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
   at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
   at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
   at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
   at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:368)
   at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
   at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
   at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
   at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
   at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
   at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
parsing query 'power'
   at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
   at org.apache.solr.search.QParser.subQuery(QParser.java:200)
   at 
org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
   at 
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
   at 
org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236)
   at 
org.apache.solr.search.ValueSourceParser$19.parse(ValueSourceParser.java:270)
   at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
   at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
   at 
org.apache.solr.search.ValueSourceParser$13.parse(ValueSourceParser.java:198)
   at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
   at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
   at 
org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
   at 
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
   at 

SOLR Security - Displaying endpoints to public

2014-01-06 Thread Developer
Hi,

We are currently showing the SOLR endpoints to the public when using our
application (public users would be able to view the SOLR endpoints (/select)
and the query in debugging console).

I am trying to figure out if there is any security threat in terms of
displaying the endpoints directly in internet. We have disabled the update
handler in production so I assume writes / updates are not possible. 

The below URL mentions a point 'Solr does not concern itself with security
either at the document level or the communication level. It is strongly
recommended that the application server containing Solr be firewalled such
the only clients with access to Solr are your own.'

Is the above statement true even if we just display the read-only endpoints
to the public users? Can someone please advise?

http://wiki.apache.org/solr/SolrSecurity 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Security-Displaying-endpoints-to-public-tp4109792.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: adding wild card at the end of the text and search(like sql like search)

2014-01-06 Thread suren
by using Q command and passing query parameter defType=unorderedcomplexphrase
it worked for me.

http://localhost:8999/solr/MACSearch/select?q=LAST_NAM%3A%22DE+PAR*%22%0Awt=xmlindent=truedefType=unorderedcomplexphrase

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/adding-wild-card-at-the-end-of-the-text-and-search-like-sql-like-search-tp4108399p4109796.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: delta-import giving Total Documents Processed = 0

2014-01-06 Thread suren
I think issue was with deltaImportQuery, it is case sensitive. I was using
'${dataimporter.delta.clai_idn}'
instead of '${dataimporter.delta.CLAI_IDN}'

field column=CLAI_IDN name=CLAI_IDN /



--
View this message in context: 
http://lucene.472066.n3.nabble.com/delta-import-giving-Total-Documents-Processed-0-tp4089118p4109798.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR Security - Displaying endpoints to public

2014-01-06 Thread Shawn Heisey

On 1/6/2014 10:55 AM, Developer wrote:

We are currently showing the SOLR endpoints to the public when using our
application (public users would be able to view the SOLR endpoints (/select)
and the query in debugging console).

I am trying to figure out if there is any security threat in terms of
displaying the endpoints directly in internet. We have disabled the update
handler in production so I assume writes / updates are not possible.

The below URL mentions a point 'Solr does not concern itself with security
either at the document level or the communication level. It is strongly
recommended that the application server containing Solr be firewalled such
the only clients with access to Solr are your own.'

Is the above statement true even if we just display the read-only endpoints
to the public users? Can someone please advise?


Without an application between the public and Solr that sanitizes user 
input, an attacker can send denial of service queries to your Solr 
instance that will cause it to spin so hard it can't serve regular 
queries.  We can't block such things in server code, because sometimes 
such queries *are* legitimate, they just take a lot of resources and 
time to complete.


Even if you disable admin handlers so that it's impossible to gather 
full information about your schema and other settings, generating 
legitimate queries is probably enough for an attacker to get the 
information they need.


If your design is such that client-side scripting handles almost 
everything, you probably need to set up a proxy in front of Solr that's 
configured to deny things that look suspicious.  I do not know of any 
publicly available proxy configurations like this, and I have never come 
across any private ones either.


Thanks,
Shawn



Re: Function query matching

2014-01-06 Thread Peter Keegan
: The bottom line for Peter is still the same: using scale() wrapped arround
: a function/query does involve a computing hte results for every document,
: and that is going to scale linearly as the size of hte index grows -- but
: it it is *only* because of the scale function.

Another problem with this approach is that the scale() function will likely
generate incorrect values because it occurs before any filters. If the
filters drop high scoring docs, the scaled values will never include the
'maxTarget' value (and may not include the 'minTarget' value, either).

Peter


On Sat, Dec 7, 2013 at 2:30 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 (This is why i shouldn't send emails just before going to bed.)

 I woke up this morning realizing that of course I was completley wrong
 when i said this...

 : I want to be clear for 99% of the people reading this, if you find
 : yourself writting a query structure like this...
 :
 :   q={!func}..functions involving wrapping $qq ...
 ...
 : ...Try to restructure the match you want to do into the form of a
 : multiplier
 ...
 : Because the later case is much more efficient and Solr will only compute
 : the function values for hte docs it needs to (that match the wrapped $qq
 : query)

 The reason i was wrong...

 Even though function queries do by default match all documents, and even
 if the main query is a function query (ie: q={!func}...), if there is
 an fq that filters down the set of documents, then the (main) function
 query will only be calculated for the documents that match the filter.

 It was trivial to ammend the test i mentioned last night to show this (and
 i feel silly for not doing that last night and stoping myself from saying
 something foolish)...

   https://svn.apache.org/viewvc?view=revisionrevision=r1548955

 The bottom line for Peter is still the same: using scale() wrapped arround
 a function/query does involve a computing hte results for every document,
 and that is going to scale linearly as the size of hte index grows -- but
 it it is *only* because of the scale function.



 -Hoss
 http://www.lucidworks.com/



MergePolicy for append-only indices?

2014-01-06 Thread Otis Gospodnetic
Hi,
(cross-posting to both Solr and Lucene user lists because while this is a
Lucene-level question, I suspect a lot of people who know about this or are
interested in this subject are actually on the Solr list)

I have a large append-only index and I looked at merge policies hoping to
identify one that is naturally more suitable for indices without any
updates and deletions, just adds.

I've read
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/index/TieredMergePolicy.htmland
the javadocs for its cousins, but it doesn't look like any of them is
more suited for append-only index than the other ones and Tiered MP having
more knobs is probably the best one to use.

I was wondering if I was missing something, if one of the MPs is in fact
better for append-only indices OR if one can suggest how one could write a
custom MP that's specialized for append-only indices.

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


Re: SOLR Security - Displaying endpoints to public

2014-01-06 Thread Shawn Heisey

On 1/6/2014 11:18 AM, Shawn Heisey wrote:
Even if you disable admin handlers so that it's impossible to gather 
full information about your schema and other settings, generating 
legitimate queries is probably enough for an attacker to get the 
information they need.


Self-replying on this point: If you *don't* disable admin handlers, an 
attacker would also be able to simply unload the core and ask Solr to 
delete it from disk.


A side effect of disabling admin handlers is that the admin UI won't 
work either.  In terms of security hardening, that's a good thing ... 
but it makes it *very* difficult to gather useful information about your 
installation's health.


Thanks,
Shawn



Re: MergePolicy for append-only indices?

2014-01-06 Thread Shawn Heisey

On 1/6/2014 11:24 AM, Otis Gospodnetic wrote:

(cross-posting to both Solr and Lucene user lists because while this is a
Lucene-level question, I suspect a lot of people who know about this or are
interested in this subject are actually on the Solr list)

I have a large append-only index and I looked at merge policies hoping to
identify one that is naturally more suitable for indices without any
updates and deletions, just adds.

I've read
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/index/TieredMergePolicy.htmland
the javadocs for its cousins, but it doesn't look like any of them is
more suited for append-only index than the other ones and Tiered MP having
more knobs is probably the best one to use.

I was wondering if I was missing something, if one of the MPs is in fact
better for append-only indices OR if one can suggest how one could write a
custom MP that's specialized for append-only indices.


The Tiered policy was made default for Solr back in the 3.x days.  
Defaults in both Solr and Lucene don't normally change without some 
serious thought about the repercussions.


As for what's best for different kinds of indexes (add-only vs 
update/delete) ... unless there are *enormous* numbers of deletions 
(whether from updates or pure delete requests), I don't think that 
affects the decision very much.  The Tiered policy seems like it's 
probably the best choice either way.  I assume you've seen the following 
blog post?


http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Thanks,
Shawn



Re: SOLR Security - Displaying endpoints to public

2014-01-06 Thread Raymond Wiker

On 06 Jan 2014, at 19:37 , Shawn Heisey s...@elyograg.org wrote:

 On 1/6/2014 11:18 AM, Shawn Heisey wrote:
 Even if you disable admin handlers so that it's impossible to gather full 
 information about your schema and other settings, generating legitimate 
 queries is probably enough for an attacker to get the information they need.
 
 Self-replying on this point: If you *don't* disable admin handlers, an 
 attacker would also be able to simply unload the core and ask Solr to delete 
 it from disk.
 
 A side effect of disabling admin handlers is that the admin UI won't work 
 either.  In terms of security hardening, that's a good thing ... but it makes 
 it *very* difficult to gather useful information about your installation's 
 health.
 

If you want to apply some sort of access restrictions on the content, you will 
need a mechanism to identify the user and add parameters to restrict the result 
set. You will also need to stop the user from circumventing this mechanism, 
which basically means that the raw Solr endpoints must not be accessible to 
the user.



DateField - Invalid JSON String Exception - converting Query Response to JSON Object

2014-01-06 Thread Amit Jha
Hi,

Wish You All a Very Happy New Year.

We have index where date field have default value as 'NOW'. We are using
solrj to query solr and when we try to convert query
response(response.getResponse) to JSON object in java. The JSON
API(org.json) throws 'invalid json string' exception. API say so because
date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
inverted commas(  ). So It says required , or } character when API see the
colon.

Could you please help me to retrieve the date field value as string in JSON
response. Or any pointers.

Any help would be highly appreciable.


Re: Slowness of Solr search during the replication

2014-01-06 Thread Mikhail Khludnev
Hello Siva,

Do you have an idea what make them freeze?
Ideally you might be able to take a thread-dump at the moment of freeze, if
you can.

Also, check SolrIndexSearcher debug logs for autowarming timing.

What about specifying few heaviest query in newSearcher listener
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Query-RelatedListeners?

+1 for bumping auto-warming on slaves.


On Mon, Jan 6, 2014 at 4:34 PM, sivaprasad sivaprasa...@echidnainc.comwrote:

 Do we need to set the autowarmCount on Slave or master? As per the Solr
 WIKI,
 I found the below information.

 Solr4.0 autowarmCount can now be specified as a percentage (ie: 90%)
 which
 will be evaluated relative to the number of items in the existing cache.
 This can be an advantageous setting in an instance of Solr where you don't
 expect any search traffic (ie a master),
  but you want some caches so that if it does take on traffic it won't be
 too
 overloaded. Once the traffic dies down, subsequent commits will gradually
 decrease the number of items being warmed.

 Regards,
 Siva



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Slowness-of-Solr-search-during-the-replication-tp4109712p4109739.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: MergePolicy for append-only indices?

2014-01-06 Thread Michael Sokolov
I think the key optimization when there are no deletions is that you 
don't need to renumber documents and can bulk-copy blocks of contiguous 
documents, and that is independent of merge policy. I think :)


-Mike

On 01/06/2014 01:54 PM, Shawn Heisey wrote:

On 1/6/2014 11:24 AM, Otis Gospodnetic wrote:
(cross-posting to both Solr and Lucene user lists because while this 
is a
Lucene-level question, I suspect a lot of people who know about this 
or are

interested in this subject are actually on the Solr list)

I have a large append-only index and I looked at merge policies 
hoping to

identify one that is naturally more suitable for indices without any
updates and deletions, just adds.

I've read
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/index/TieredMergePolicy.htmland 


the javadocs for its cousins, but it doesn't look like any of them is
more suited for append-only index than the other ones and Tiered MP 
having

more knobs is probably the best one to use.

I was wondering if I was missing something, if one of the MPs is in fact
better for append-only indices OR if one can suggest how one could 
write a

custom MP that's specialized for append-only indices.


The Tiered policy was made default for Solr back in the 3.x days. 
Defaults in both Solr and Lucene don't normally change without some 
serious thought about the repercussions.


As for what's best for different kinds of indexes (add-only vs 
update/delete) ... unless there are *enormous* numbers of deletions 
(whether from updates or pure delete requests), I don't think that 
affects the decision very much.  The Tiered policy seems like it's 
probably the best choice either way.  I assume you've seen the 
following blog post?


http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html 



Thanks,
Shawn





Seemingly arbitrary error on document adds to SolrCloud - Server Error request: http://10.0.0.5:8443/solr/collection1/update?update.distrib=TOLEADERdistrib.from=...

2014-01-06 Thread cwhi
I'm adding dozens of documents every few minutes to a SolrCloud instance with
3 machines and ~ 25 million documents.  I'm starting to see issues where
adds are throwing these ugly errors that seem to indicate there might be
some issues with the nodes communicating to one another.  My posts are of
the following form, but with about 30 fields rather than just 1: 
add
doc
field name=id112370241/field
/doc
/add

And here is the error that Solr is throwing:

null:org.apache.solr.common.SolrException: Server Error

request:
http://10.0.0.5:8443/solr/collection1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F10.0.0.229%3A8443%2Fsolr%2Fcollection1%2Fwt=javabinversion=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)


What is the source of these errors, and how can I resolve them?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Seemingly-arbitrary-error-on-document-adds-to-SolrCloud-Server-Error-request-http-10-0-0-5-8443-solr-tp4109864.html
Sent from the Solr - User mailing list archive at Nabble.com.


Setting max number of connections

2014-01-06 Thread Sir Gilligan
I am trying to increase the max number of connections allowed for query 
with Solr Cloud.


I have searched around and found mention that:
max number of connections is 128
max number of connections per host is 32

I start solr in the example directory with some options, but basically 
it is just:

java -jar start.jar

How can I increase the two values above?

Is there some config file that needs changing?

While I wait to see what recommendations the community has to offer I am 
experimenting with the following that I read on the SolrConfigXml wiki:


  requestHandler name=standard class=solr.SearchHandler 
default=true

lst name=defaults
  str name=echoParamsexplicit/str
  str name=dftext/str
/lst
!-- other params go here --
 shardHandlerFactory class=HttpShardHandlerFactory
int name=socketTimeOut1000/int
int name=connTimeOut5000/int
int name=maxConnectionsPerHost512/int
  /shardHandlerFactory
  /requestHandler

I added the echo params and df as text because there was some error that 
faceting query threw.


Any help is appreciated.

I have been trying to apply load to our solr system and I can't get the 
CPU on the boxes to hardly budge. I noticed that during our tests when 
we reached 128 users the throughput flat lined at that point, and so I 
searched and sure enough I found the 128 connection limit mentioned.


Thank you.


Re: SOLR Security - Displaying endpoints to public

2014-01-06 Thread Otis Gospodnetic
Apache url_rewrite can help with this and it's only a few minutes to set up.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Mon, Jan 6, 2014 at 12:55 PM, Developer bbar...@gmail.com wrote:

 Hi,

 We are currently showing the SOLR endpoints to the public when using our
 application (public users would be able to view the SOLR endpoints
 (/select)
 and the query in debugging console).

 I am trying to figure out if there is any security threat in terms of
 displaying the endpoints directly in internet. We have disabled the update
 handler in production so I assume writes / updates are not possible.

 The below URL mentions a point 'Solr does not concern itself with security
 either at the document level or the communication level. It is strongly
 recommended that the application server containing Solr be firewalled such
 the only clients with access to Solr are your own.'

 Is the above statement true even if we just display the read-only endpoints
 to the public users? Can someone please advise?

 http://wiki.apache.org/solr/SolrSecurity



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Security-Displaying-endpoints-to-public-tp4109792.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Index for csv-file created successfully, but no data is shown

2014-01-06 Thread Otis Gospodnetic
Hi,

This may be a better question for the Cloudera Search mailing list.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Mon, Jan 6, 2014 at 11:06 AM, Huynh, Chi-Hao hu...@initions.com wrote:

 Dear solr users,

 I would appreciate if someone can help me out here. My goal is to index a
 csv-file.

 First of all, I am using the CDH 5 beta distribution of Hadoop, which
 includes solr 4.4.0, on a single node. I am following the hue tutorial to
 index and search the data from the yelp dataset challenge
 http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7-how-to-index-and-search
 .

 Following the tutorial, I have uploaded the config files, including the
 prepared schema.xml, to zookeeper via the solrctl-command:
 solrctl instancedir --create reviews [path to conf]

 After this, I have created the collection via:
 solrctl collection --create reviews -s 1

 This works fine, as I can see the collection created in the Solr Admin Web
 UI and the instancedir in the zookeeper shell.

 Then, using the MapReduceIndexerTool and the provided morphline file the
 index is created and uploaded to solr. According to the command output the
 index was created successfully:

 1481 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Indexing
 1 files using 1 real mappers into 1 reducers
 52716 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Done.
 Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs
 52774 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of output
 shards into Solr cluster...
 52829 [pool-4-thread-1] INFO  org.apache.solr.hadoop.GoLive  - Live merge
 hdfs://svr-hdp01:8020/tmp/load/results/part-0 into
 http://SVR-HDP01:8983/solr
 53017 [pool-4-thread-1] INFO
  org.apache.solr.client.solrj.impl.HttpClientUtil  - Creating new http
 client,
 config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
 53495 [main] INFO  org.apache.solr.hadoop.GoLive  - Committing live
 merge...
 53496 [main] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  -
 Creating new http client, config:
 53512 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  -
 Waiting for client to connect to ZooKeeper
 53513 [main-EventThread] INFO
  org.apache.solr.common.cloud.ConnectionManager  - Watcher
 org.apache.solr.common.cloud.ConnectionManager@19014023name:ZooKeeperConnection
  Watcher:SVR-HDP01:2181/solr got event WatchedEvent
 state:SyncConnected type:None path:null path:null type:None
 53513 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  -
 Client is connected to ZooKeeper
 53514 [main] INFO  org.apache.solr.common.cloud.ZkStateReader  - Updating
 cluster state from ZooKeeper...
 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Done committing live
 merge
 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of index
 shards into Solr cluster took 0.878 secs
 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging completed
 successfully
 53652 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  -
 Succeeded with job: jobName:
 org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId:
 job_1388405934175_0013
 53653 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Success.
 Done. Program took 53.719 secs. Goodbye.

 Now, when I go to the web UI and select the created core, I find the core
 to be empty, there are 0 number of Docs and querying it bears no result. My
 question is, if I have to upload the csv-file manually to somewhere on the
 solr server as it seems as if the csv-file was parsed and indexed
 successfully, but the data is missing that was indexed.

 I hope, the description of the problem was clear enough. Thanks a lot!
 Kind regards

 __
 initions AG
 Chi-Hao Huynh
 Weidestraße 120a
 D-22081 Hamburg

 t:   +49 (0) 40 / 41 49 60-62
 f:   +49 (0) 40 / 41 49 60-11
 e:  hu...@initios.commailto:hu...@initios.com
 w: www.initions.comhttp://www.initions.com
 Vollständiger Name der Gesellschaft: initions innovative IT solutions AG
 Sitz der Gesellschaft: Hamburg
 Handelsregister Hamburg B 83929
 Aufsichtsratsvorsitzender: Dr. Michael Leue
 Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn




Re: Branch/Java questions re: contributing code

2014-01-06 Thread Ryan Cutter
Thanks, everything worked fine after these pointers and I was able to
generate a patch properly.

Cheers, Ryan


On Mon, Jan 6, 2014 at 7:31 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Jan 6, 2014 at 8:54 PM, Ryan Cutter ryancut...@gmail.com wrote:
  1. Should we be using Java 6 or 7?  The docs say 1.6 (
  http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on
  trunk/ yields:
 
  /lucene/common-build.xml:328: Minimum supported Java version is 1.7.
 
  I don't get that error with branch_4x/ which leads to my next question.

 branch_4x is on Java 6 and trunk is on Java 7.

 
  2. Should work toward 4.X be done on trunk/ or branch_4x/?  It sounds
 like
  patches should be based on trunk then it gets ported as necessary.
 
  Thanks! Ryan

 Yeah, you are right. Features are committed to trunk first and
 backported to branch_4x

 --
 Regards,
 Shalin Shekhar Mangar.



Re: DateField - Invalid JSON String Exception - converting Query Response to JSON Object

2014-01-06 Thread Amit Jha
Hi,


We have index where date field have default value as 'NOW'. We are using
solrj to query solr and when we try to convert query
response(response.getResponse) to JSON object in java. The JSON
API(org.json) throws 'invalid json string' exception. API say so because
date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
inverted commas(  ). So It says required , or } character when API see the
colon.

Could you please help me to retrieve the date field value as string in JSON
response. Or any pointers.

Any help would be highly appreciable.


On Tue, Jan 7, 2014 at 12:28 AM, Amit Jha shanuu@gmail.com wrote:

 Hi,

 Wish You All a Very Happy New Year.

 We have index where date field have default value as 'NOW'. We are using
 solrj to query solr and when we try to convert query
 response(response.getResponse) to JSON object in java. The JSON
 API(org.json) throws 'invalid json string' exception. API say so because
 date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
 inverted commas(  ). So It says required , or } character when API see the
 colon.

 Could you please help me to retrieve the date field value as string in
 JSON response. Or any pointers.

 Any help would be highly appreciable.






Re: DateField - Invalid JSON String Exception - converting Query Response to JSON Object

2014-01-06 Thread Ahmet Arslan
Hi Amit,

If you want json response, Why don't you use wt=json?

Ahmet


On Tuesday, January 7, 2014 7:34 AM, Amit Jha shanuu@gmail.com wrote:
Hi,


We have index where date field have default value as 'NOW'. We are using
solrj to query solr and when we try to convert query
response(response.getResponse) to JSON object in java. The JSON
API(org.json) throws 'invalid json string' exception. API say so because
date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
inverted commas(  ). So It says required , or } character when API see the
colon.

Could you please help me to retrieve the date field value as string in JSON
response. Or any pointers.

Any help would be highly appreciable.



On Tue, Jan 7, 2014 at 12:28 AM, Amit Jha shanuu@gmail.com wrote:

 Hi,

 Wish You All a Very Happy New Year.

 We have index where date field have default value as 'NOW'. We are using
 solrj to query solr and when we try to convert query
 response(response.getResponse) to JSON object in java. The JSON
 API(org.json) throws 'invalid json string' exception. API say so because
 date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
 inverted commas(  ). So It says required , or } character when API see the
 colon.

 Could you please help me to retrieve the date field value as string in
 JSON response. Or any pointers.

 Any help would be highly appreciable.







pagination with grouping

2014-01-06 Thread Senthilnathan Vijayaraja
Hi,

i am using group.query like below,

  group=true
  group.query=_query_:{!frange l=0 u=10 v=$score}
  group.query=_query_:{!frange l=10 u=20 v=$score}
  group.query=_query_:{!frange l=20 u=30 v=$score}

here I want to restrict the overall record counts, start=0rows=10 is not
working here.
within the group level we can do this using offset=0group.limit=10.

for egI want only 10 records, if the first group
group.query=_query_:{!frange l=0 u=10 v=$score} contain 10 records,i don't
need records from other two groups.

could someone help me please.



Thanks  Regards,
Senthilnathan V