RE: date range tree

2013-11-13 Thread Andreas Owen
I solved it by adding a loop for years and one for quartals in which i count
the month-facets

-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Montag, 11. November 2013 17:52
To: solr-user@lucene.apache.org
Subject: RE: date range tree

Has someone at least got a idee how i could do a year/month-date-tree? 

In Solr-Wiki it is mentioned that facet.date.gap=+1DAY,+2DAY,+3DAY,+10DAY
should create 4 buckets but it doesn't work


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Donnerstag, 7. November 2013 18:23
To: solr-user@lucene.apache.org
Subject: date range tree

I would like to make a facet on a date field with the following tree:

 

2013

4.Quartal

December

November

Oktober

3.Quartal

September

August

Juli

2.Quartal

June

Mai

April

1.   Quartal

March

February

January

2012 .

Same as above

 

 

So far I have this in solrconfig.xml:

 

str
name=facet.date{!ex=last_modified,thema,inhaltstyp,doctype}last_modified
/str

   str
name=facet.date.gap+1MONTH/str

   str
name=facet.date.endNOW/MONTH/str

   str
name=facet.date.startNOW/MONTH-36MONTHS/str

   str
name=facet.date.otherafter/str

 

Can I do this in one query or do I need multiple queries? If yes how would I
do the second and keep all the facet queries in the count?




Re: serialization error - BinaryResponseWriter

2013-11-13 Thread giovanni.bricc...@banzai.it
Mhhh, I run a dih full reload every night, and the source field is a 
sqlserver smallint column...


By the way I'll try cleaning the data dir of the index and reindexing

Il 12/11/13 17:13, Shawn Heisey ha scritto:

On 11/12/2013 2:37 AM, giovanni.bricc...@banzai.it wrote:

I'm getting some errors reading boolean filelds, can you give me any
suggestions? in this example I only have four false fields:
leasing=false, FiltroNovita=false, FiltroFreeShipping=false, Outlet=false.

this is the stack trace (solr 4.2.1)

java.lang.NumberFormatException: For input string: false
 at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

 at java.lang.Integer.parseInt(Integer.java:492)
 at java.lang.Integer.valueOf(Integer.java:582)
 at org.apache.solr.schema.IntField.toObject(IntField.java:89)
 at org.apache.solr.schema.IntField.toObject(IntField.java:43)
 at
org.apache.solr.response.BinaryResponseWriter$Resolver.getValue(BinaryResponseWriter.java:223)

Solr stores boolean values internally as a number - 0 or 1.  That gets
changed to true/false when displaying search results.

It sounds like what you have here is quite possibly an index which
originally had text fields with the literal string true or false,
and you've changed your schema so these fields are now boolean.  When
you change your schema, you have to reindex.

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn





Re: Multi-Tenant Setup in Single Core

2013-11-13 Thread Christian Ramseyer
On 11/12/13 5:20 PM, Shawn Heisey wrote:
 Ensure that all handler names start with a slash character, so they are
 things like /query, /select, and so on.  Make sure that handleSelect
 is set to false on your requestDispatcher config.  This is how Solr 4.x
 examples are set up already.
 
 With that config, the qt parameter will not function and will be
 ignored -- you must use the request handler path as part of the URL --
 /solr/corename/handler.


Great thanks, I already had it this way but I wasn't aware of these fine
details, very helpful.

Christian




Re: Modify the querySearch to q=*:*

2013-11-13 Thread Alvaro Cabrerizo
Hi:

First of all I have to say that I had never heard about *\* as the query to
get all the documents in a index but *:*  (maybe I'm wrong) . Re-reading
Apache Solr 4 cookbook, Solr 1.4 Enterprise Search Server and  Apache
Solr 3 Enterprise Search Server there is no trace for the query *\* as the
universal query to get every doc.

If you enable 
debugQueryhttp://wiki.apache.org/solr/CommonQueryParameters#debugQuery
you
can see that *:* is transformed into MatchAllDocsQuery(*:*) (Solr1.4 and
Solr4.4) wich means give me all the documents, but the query *\* is
transformed into other thing (In my case having a default field called
description defined in the schema) I get in Solr1.4 description:*\\* wich
means give all the documents that have the char \ in the field description
and in SOLR1.4  I get description:** which also gets all the documents in
the index. It would be helpful to see how is interpreted *\* in your system
(solr3.5 and solr4).

I think, the best way to solve your problem Is to modify the system which
launches the request to SOLR and modify *\* by *:* (if it is possible). I
dont know if SOLR can make that kind of translation, I mean change *\* by
*:*.  One possible workaround with collateral damages is the inclusion of a
PatternReplaceCharFilterFactory (in schema.xml) within the fieldtypes you
use to search in order to delete every \ character included in the input or
even include the expression to transform *\* into *:* . But including that
element in your schema means that it will always be used during your search
(thus if your users type a\b they will search ab). If you want to explore
that path I recommend you to use the analysis
toolhttps://cwiki.apache.org/confluence/display/solr/Analysis+Screenincluded
in solr.

Regards.













On Wed, Nov 13, 2013 at 2:34 AM, Shawn Heisey s...@elyograg.org wrote:

 On 11/12/2013 6:03 PM, Abhijith Jain -X (abhijjai - DIGITAL-X INC at
 Cisco) wrote:

 I am trying to set the query to q=*:* permanently. I tried to set q=*:*
 in SolrConfig.xml file as follows.

 requestHandler name=standard class=solr.SearchHandler default=true
  lst name=defaults
  str name=echoParamsnone/str
  str name=q*:*/str
  /lst
  /requestHandler

 But this didn’t help. Please advise how to change query to q=*:* in Solr
 4.4.


 This configuration sets the default for the q parameter to *:*, but if the
 actual query that is sent to Solr has a q parameter, it will override that
 default.

 In the very unlikely situation that you don't want to ever do any query
 besides *:*, you can put that setting into the invariants section instead
 of the defaults section - but be aware that if you do that, you will never
 be able to send any other query.Normally your application decides what the
 query string should be, not Solr.

 I concur with Jack's recommendation that you migrate to the 4.x way of
 naming handlers.  You would need to set handleSelect to false and change
 all your search handlers so their name starts with a slash.  The one that
 is currently named standard would instead be named /select and you
 would need to remove the default=true setting.

 Thanks,
 Shawn




Re: solrcloud - forward update to a shard failed

2013-11-13 Thread michael.boom
Do you do your commit from the two indexing clients or have the autocommit
set to maxDocs = 1000 ?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100633.html
Sent from the Solr - User mailing list archive at Nabble.com.


Updating Document Score With Payload of Multivalued Field?

2013-11-13 Thread Furkan KAMACI
Here is my case;

I have a field at my schema named *elmo_field*. I want that *elmo_field* should
have multiple values and multiple payloads. i.e.

dorothy|0.46
sesame|0.37
big bird|0.19
bird|0.22

When a user searches for a keyword i.e. *dorothy* I want to add 0.46 to
score. If user searches for *big bird *0.19 and if user searches for *bird *
0.22

I mean I will make a search on my index at my other fields of solr schema.
 And I will make another search (this one is an exact match search) at
*elmo_field* at same time and if matches something I will increase score
with payloads.

How can I do that: adding something to score at multivalued payload (with a
nested query or not) and do you have any other ideas to achieve that?


Re: Updating Document Score With Payload of Multivalued Field?

2013-11-13 Thread Furkan KAMACI
PS: I use Solr 4.5.1


2013/11/13 Furkan KAMACI furkankam...@gmail.com

 Here is my case;

 I have a field at my schema named *elmo_field*. I want that *elmo_field* 
 should
 have multiple values and multiple payloads. i.e.

 dorothy|0.46
 sesame|0.37
 big bird|0.19
 bird|0.22

 When a user searches for a keyword i.e. *dorothy* I want to add 0.46 to
 score. If user searches for *big bird *0.19 and if user searches for *bird
 *0.22

 I mean I will make a search on my index at my other fields of solr schema.
  And I will make another search (this one is an exact match search) at
 *elmo_field* at same time and if matches something I will increase score
 with payloads.

 How can I do that: adding something to score at multivalued payload (with
 a nested query or not) and do you have any other ideas to achieve that?






Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Alexandre Rafalovitch
So, it sounds like that either Solr is treated as a webapp, in which case
it is installed with most of the webapps under Tomcat (legacy/operational
reason). So, Solr docs just needs to explain how to deploy under Tomcat and
the rest of document/tooling comes from Tomcat community.

Or, if Solr is treated not as a webapp but as a black box, it needs to
support and explain all the operational requirements (deployment,
extension, monitoring) that are currently waved away as a 'container
issue'.

Regards,
   Alex.
P.s. I also agree that example directory layout is become very confusing
and may need to be re-thought. Probably a discussion for a different
thread, if somebody has a thought out suggestion.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Nov 12, 2013 at 8:32 PM, Gopal Patwa gopalpa...@gmail.com wrote:

 My case is also similar to Sujit Pal but we have jboss6.


 On Tue, Nov 12, 2013 at 9:47 AM, Sujit Pal sujit@comcast.net wrote:

  In our case, it is because all our other applications are deployed on
  Tomcat and ops is familiar with the deployment process. We also had
  customizations that needed to go in, so we inserted our custom JAR into
 the
  solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr
  was (almost, except for schema.xml or solrconfig.xml changes) identical
 to
  any of the other apps. But I think if Solr becomes a server with clearly
  defined extension points (such as dropping your custom JARs into lib/ and
  custom configuration in conf/solrconfig.xml or similar like it already
 is)
  then it will be treated as something other than a webapp and the
  expectation that it runs on Tomcat will not apply.
 
  Just my $0.02...
 
  Sujit
 
 
 
  On Tue, Nov 12, 2013 at 9:13 AM, Siegfried Goeschl sgoes...@gmx.at
  wrote:
 
   Hi ALex,
  
   in my case
  
   * ignorance that Tomcat is not fully supported
   * Tomcat configuration and operations know-how inhouse
   * could migrate to Jetty but need approved change request to do so
  
   Cheers,
  
   Siegfried Goeschl
  
   On 12.11.13 04:54, Alexandre Rafalovitch wrote:
  
   Hello,
  
   I keep seeing here and on Stack Overflow people trying to deploy Solr
 to
   Tomcat. We don't usually ask why, just help when where we can.
  
   But the question happens often enough that I am curious. What is the
   actual
   business case. Is that because Tomcat is well known? Is it because
 other
   apps are running under Tomcat and it is ops' requirement? Is it
 because
   Tomcat gives something - to Solr - that Jetty does not?
  
   It might be useful to know. Especially, since Solr team is considering
   making the server part into a black box component. What use cases will
   that
   break?
  
   So, if somebody runs Solr under Tomcat (or needed to and gave up),
 let's
   use this thread to collect this knowledge.
  
   Regards,
   Alex.
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
 at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
  
  
 



(info) lucene first search performance

2013-11-13 Thread Jacky.J.Wang (mis.cnsh04.Newegg) 41361


Dear lucene


In order to test the solr search performance  ,I closed all the cache solr
[cid:image001.png@01CEE0AA.39ECDE90]

insert into the 10 million data,and find  the first search very 
slowly(700ms),and the secondary search very quick(20ms),I am sure no solr cache。

This problem bothering me for a month,



Tracing the source code found



[说明: 说明: cid:image001.png@01CED80C.EF49C740]

Fisrt  invoke readVIntBlock method always very slowly  ,and secondary invoke 
readVIntBlock method is very quick, I don't know what reason is this



Eagerly awaiting your reply, thanks very much!!!




Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Dmitry Kan
Hi,

Reading that people have considered deploying example folder is slightly
strange to me. No wonder they are confused and confuse their ops. We just
took vanilla jetty (jetty9) and installed solr.war on it, configured it, no
example folders at all. Since then it works nicely.

The main reason for us to get away from tomcat, that we have used
originally, was that it felt too heavy for running a Solr webapp, which
isn't using anything Tomcat-specific. In older versions (tomcat6) it would
leak memory and threads. We knew, that jetty is mature enough and is
lighter and used at large companies, like Google. This was convincing
enough to try.

We are still using Tomcat for other webapps, specifically for clustering
and load balancing between webapp instances, but that is not needed for our
Solr installation at this point.

Regards,

Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan



On Wed, Nov 13, 2013 at 1:42 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 So, it sounds like that either Solr is treated as a webapp, in which case
 it is installed with most of the webapps under Tomcat (legacy/operational
 reason). So, Solr docs just needs to explain how to deploy under Tomcat and
 the rest of document/tooling comes from Tomcat community.

 Or, if Solr is treated not as a webapp but as a black box, it needs to
 support and explain all the operational requirements (deployment,
 extension, monitoring) that are currently waved away as a 'container
 issue'.

 Regards,
Alex.
 P.s. I also agree that example directory layout is become very confusing
 and may need to be re-thought. Probably a discussion for a different
 thread, if somebody has a thought out suggestion.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Tue, Nov 12, 2013 at 8:32 PM, Gopal Patwa gopalpa...@gmail.com wrote:

  My case is also similar to Sujit Pal but we have jboss6.
 
 
  On Tue, Nov 12, 2013 at 9:47 AM, Sujit Pal sujit@comcast.net
 wrote:
 
   In our case, it is because all our other applications are deployed on
   Tomcat and ops is familiar with the deployment process. We also had
   customizations that needed to go in, so we inserted our custom JAR into
  the
   solr.war's WEB-INF/lib directory, so to ops the process of deploying
 Solr
   was (almost, except for schema.xml or solrconfig.xml changes) identical
  to
   any of the other apps. But I think if Solr becomes a server with
 clearly
   defined extension points (such as dropping your custom JARs into lib/
 and
   custom configuration in conf/solrconfig.xml or similar like it already
  is)
   then it will be treated as something other than a webapp and the
   expectation that it runs on Tomcat will not apply.
  
   Just my $0.02...
  
   Sujit
  
  
  
   On Tue, Nov 12, 2013 at 9:13 AM, Siegfried Goeschl sgoes...@gmx.at
   wrote:
  
Hi ALex,
   
in my case
   
* ignorance that Tomcat is not fully supported
* Tomcat configuration and operations know-how inhouse
* could migrate to Jetty but need approved change request to do so
   
Cheers,
   
Siegfried Goeschl
   
On 12.11.13 04:54, Alexandre Rafalovitch wrote:
   
Hello,
   
I keep seeing here and on Stack Overflow people trying to deploy
 Solr
  to
Tomcat. We don't usually ask why, just help when where we can.
   
But the question happens often enough that I am curious. What is the
actual
business case. Is that because Tomcat is well known? Is it because
  other
apps are running under Tomcat and it is ops' requirement? Is it
  because
Tomcat gives something - to Solr - that Jetty does not?
   
It might be useful to know. Especially, since Solr team is
 considering
making the server part into a black box component. What use cases
 will
that
break?
   
So, if somebody runs Solr under Tomcat (or needed to and gave up),
  let's
use this thread to collect this knowledge.
   
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
  at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
   
   
  
 



Re: distributed search is significantly slower than direct search

2013-11-13 Thread Erick Erickson
One thing you can try, and this is more diagnostic than a cure, is return
just
the id field (and insure that lazy field loading is true). That'll tell you
whether
the issue is actually fetching the document off disk and decompressing,
although
frankly that's unlikely since you can get your 5,000 rows from a single
machine
quickly.

The code you found where Solr is spending its time, is that on the
routing core
or on the shards? I actually have a hard time understanding how that
code could take a long time, doesn't seem right.

You are transferring 5,000 docs across the network, so it's possible that
your network is just slow, that's certainly a difference between the local
and remote case, but that's a stab in the dark.

Not much help I know,
Erick



On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir elr...@checkpoint.com wrote:

 Erick, Thanks for your response.

 We are upgrading our system using Solr.
 We need to preserve old functionality.  Our client displays 5K document
 and groups them.

 Is there a way to refactor code in order to improve distributed documents
 fetching?

 Thanks.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, October 30, 2013 3:17 AM
 To: solr-user@lucene.apache.org
 Subject: Re: distributed search is significantly slower than direct search

 You can't. There will inevitably be some overhead in the distributed case.
 That said, 7 seconds is quite long.

 5,000 rows is excessive, and probably where your issue is. You're having
 to go out and fetch the docs across the wire. Perhaps there is some
 batching that could be done there, I don't know whether this is one
 document per request or not.

 Why 5K docs?

 Best,
 Erick


 On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir elr...@checkpoint.com wrote:

  Hi all,
 
  I am using Solr 4.4 with multi cores. One core (called template) is my
  routing core.
 
  When I run
  http://127.0.0.1:8983/solr/template/select?rows=5000q=*:*shards=127.
  0.0.1:8983/solr/core1,
  it consistently takes about 7s.
  When I run http://127.0.0.1:8983/solr/core1/select?rows=5000q=*:*, it
  consistently takes about 40ms.
 
  I profiled the distributed query.
  This is the distributed query process (I hope the terms are accurate):
  When solr identifies a distributed query, it sends the query to the
  shard and get matched shard docs.
  Then it sends another query to the shard to get the Solr documents.
  Most time is spent in the last stage in the function process of
  QueryComponent in:
 
  for (int i=0; iidArr.size(); i++) {
  int id = req.getSearcher().getFirstMatch(
  new Term(idField.getName(),
  idField.getType().toInternal(idArr.get(i;
 
  How can I make my distributed query as fast as the direct one?
 
  Thanks.
 


 Email secured by Check Point



Re: (info) lucene first search performance

2013-11-13 Thread fbrisbart
Solr uses the MMap Directory by default.

What you see is surely a filesystem cache.
Once a file is accessed, it's memory mapped.
Restarting solr won't reset it.


On unix, you may reset this cache with 
  echo 3  /proc/sys/vm/drop_caches


Franck Brisbart


Le mercredi 13 novembre 2013 à 11:58 +, Jacky.J.Wang
(mis.cnsh04.Newegg) 41361 a écrit :
  
 
 Dear lucene
 
  
 
 In order to test the solr search performance ,I closed all the cache
 solr
 
 
 
 insert into the 10 million data,and find  the first search very
 slowly(700ms),and the secondary search very quick(20ms),I am
 sure no solr cache。
 
 This problem bothering me for a month,
 
  
 
 Tracing the source code found
 
  
 
 说明: 说明: cid:image001.png@01CED80C.EF49C740
 
 Fisrt  invoke readVIntBlock method always very slowly  ,and secondary
 invoke readVIntBlock method is very quick, I don't know what reason is
 this
 
  
 
 Eagerly awaiting your reply, thanks very much!!!
 
  
 
  
 
 




Re: solrcloud - forward update to a shard failed

2013-11-13 Thread Aileen
Explicit commits after writing 1000 docs in a batch from both indexing clients. 
 No auto commit.

Thanks.

 
 -Original Message
 
 Do you do your commit from the two indexing clients or have the autocommit 
 set to maxDocs = 1000 ?
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100633.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: (info) lucene first search performance

2013-11-13 Thread Erick Erickson
I have to ask a different question: Why would you disable
the caches? You're trying to test worst-case times perhaps?

Because the caches are an integral part of Solr performance.
Disabling them artificially reduces your performance
numbers. So disabling them is useful for answering the question
how bad can it get, but it's also skewing your results

FWIW,
Erick


On Wed, Nov 13, 2013 at 7:42 AM, fbrisbart fbrisb...@bestofmedia.comwrote:

 Solr uses the MMap Directory by default.

 What you see is surely a filesystem cache.
 Once a file is accessed, it's memory mapped.
 Restarting solr won't reset it.


 On unix, you may reset this cache with
   echo 3  /proc/sys/vm/drop_caches


 Franck Brisbart


 Le mercredi 13 novembre 2013 à 11:58 +, Jacky.J.Wang
 (mis.cnsh04.Newegg) 41361 a écrit :
 
 
  Dear lucene
 
 
 
  In order to test the solr search performance ,I closed all the cache
  solr
 
 
 
  insert into the 10 million data,and find  the first search very
  slowly(700ms),and the secondary search very quick(20ms),I am
  sure no solr cache。
 
  This problem bothering me for a month,
 
 
 
  Tracing the source code found
 
 
 
  说明: 说明: cid:image001.png@01CED80C.EF49C740
 
  Fisrt  invoke readVIntBlock method always very slowly  ,and secondary
  invoke readVIntBlock method is very quick, I don't know what reason is
  this
 
 
 
  Eagerly awaiting your reply, thanks very much!!!
 
 
 
 
 
 





Re: Modify the querySearch to q=*:*

2013-11-13 Thread Jack Krupansky
Just in case anybody is curious what *\* would really mean, the backslash 
means to escape the following character, which in this case means don't 
treat the second asterisk as a wildcard, but since the initial asterisk was 
not escaped (the full rule is that if there is any unescaped wildcard in a 
term then all of the escaped wildcards are treated as unescaped since Lucene 
has no support for escaping in WildcardQuery), any escaping of wildcards in 
the term is ignored, so *\* is treated as **, and ** is redundant and 
matches the same as *, so a *\* query would simply match all documents that 
have a value in the default search field. In many cases this would give 
identical results to a *:* query, but in some apps it might not.


Still it would be nice to know who originated this suggestion to use *\* 
instead of *:* - or even simply *.


-- Jack Krupansky

-Original Message- 
From: Alvaro Cabrerizo

Sent: Wednesday, November 13, 2013 4:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Modify the querySearch to q=*:*

Hi:

First of all I have to say that I had never heard about *\* as the query to
get all the documents in a index but *:*  (maybe I'm wrong) . Re-reading
Apache Solr 4 cookbook, Solr 1.4 Enterprise Search Server and  Apache
Solr 3 Enterprise Search Server there is no trace for the query *\* as the
universal query to get every doc.

If you enable 
debugQueryhttp://wiki.apache.org/solr/CommonQueryParameters#debugQuery

you
can see that *:* is transformed into MatchAllDocsQuery(*:*) (Solr1.4 and
Solr4.4) wich means give me all the documents, but the query *\* is
transformed into other thing (In my case having a default field called
description defined in the schema) I get in Solr1.4 description:*\\* wich
means give all the documents that have the char \ in the field description
and in SOLR1.4  I get description:** which also gets all the documents in
the index. It would be helpful to see how is interpreted *\* in your system
(solr3.5 and solr4).

I think, the best way to solve your problem Is to modify the system which
launches the request to SOLR and modify *\* by *:* (if it is possible). I
dont know if SOLR can make that kind of translation, I mean change *\* by
*:*.  One possible workaround with collateral damages is the inclusion of a
PatternReplaceCharFilterFactory (in schema.xml) within the fieldtypes you
use to search in order to delete every \ character included in the input or
even include the expression to transform *\* into *:* . But including that
element in your schema means that it will always be used during your search
(thus if your users type a\b they will search ab). If you want to explore
that path I recommend you to use the analysis
toolhttps://cwiki.apache.org/confluence/display/solr/Analysis+Screenincluded
in solr.

Regards.













On Wed, Nov 13, 2013 at 2:34 AM, Shawn Heisey s...@elyograg.org wrote:


On 11/12/2013 6:03 PM, Abhijith Jain -X (abhijjai - DIGITAL-X INC at
Cisco) wrote:


I am trying to set the query to q=*:* permanently. I tried to set q=*:*
in SolrConfig.xml file as follows.

requestHandler name=standard class=solr.SearchHandler 
default=true

 lst name=defaults
 str name=echoParamsnone/str
 str name=q*:*/str
 /lst
 /requestHandler

But this didn’t help. Please advise how to change query to q=*:* in Solr
4.4.



This configuration sets the default for the q parameter to *:*, but if the
actual query that is sent to Solr has a q parameter, it will override that
default.

In the very unlikely situation that you don't want to ever do any query
besides *:*, you can put that setting into the invariants section instead
of the defaults section - but be aware that if you do that, you will never
be able to send any other query.Normally your application decides what the
query string should be, not Solr.

I concur with Jack's recommendation that you migrate to the 4.x way of
naming handlers.  You would need to set handleSelect to false and change
all your search handlers so their name starts with a slash.  The one that
is currently named standard would instead be named /select and you
would need to remove the default=true setting.

Thanks,
Shawn






Re: solrcloud - forward update to a shard failed

2013-11-13 Thread michael.boom
I did something like that also, and i was getting some nasty problems when
one of my clients would try to commit before a commit issued by another one
hadn't yet finish. Might be the same problem for you too.

Try not doing explicit commits fomr the indexing client and instead set the
autocommit to 1000 docs or whichever value fits you best.




-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100670.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLRJ API to do similar CURL command execution

2013-11-13 Thread Anupam Bhattacharya
I am able to perform the xml atomic update properly using curl commands.
However the moment I try to achieve the same using the solrj APIs I am
facing problems.

What should be the equivalent SOLRJ api code to perform similar action
using the below CURL command ?

curl http://search1.es.dupont.com:8080/solr/core1/update; -H
Content-Type: text/xml --data-binary adddocfield
name=\id\uniqueid/fieldfield name=\tags\
update=\add\updatefieldvalue/field/doc/add

I have attempted below code but it fails to add the field in proper manner
as it get set as {add=[updatefieldvalue]}.

QueryResponse qs2 = solr.query(params2);
MapString, ListString operation = new HashMapString, ListString();
ListString vals = new ArrayListString();
vals.add(tag);
SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, (String)qs2.getResults().get(j).get(id));
operation.put(add,vals);
doc.addField(tags, operation);

Thanks in advance for any inputs.

Regards
Anupam


Re: SOLRJ API to do similar CURL command execution

2013-11-13 Thread Anupam Bhattacharya
How can I post the whole XML string to SOLR using its SOLRJ API ?


On Wed, Nov 13, 2013 at 6:50 PM, Anupam Bhattacharya anupam...@gmail.comwrote:

 I am able to perform the xml atomic update properly using curl commands.
 However the moment I try to achieve the same using the solrj APIs I am
 facing problems.

 What should be the equivalent SOLRJ api code to perform similar action
 using the below CURL command ?

 curl http://search1.es.dupont.com:8080/solr/core1/update; -H
 Content-Type: text/xml --data-binary adddocfield
 name=\id\uniqueid/fieldfield name=\tags\
 update=\add\updatefieldvalue/field/doc/add

 I have attempted below code but it fails to add the field in proper manner
 as it get set as {add=[updatefieldvalue]}.

 QueryResponse qs2 = solr.query(params2);
 MapString, ListString operation = new HashMapString, ListString();
 ListString vals = new ArrayListString();
 vals.add(tag);
 SolrInputDocument doc = new SolrInputDocument();
 doc.addField(id, (String)qs2.getResults().get(j).get(id));
 operation.put(add,vals);
 doc.addField(tags, operation);

 Thanks in advance for any inputs.

 Regards
 Anupam




-- 
Thanks  Regards
Anupam Bhattacharya


Updating an entry in Solr

2013-11-13 Thread gohome190
Hi,
I've been researching how to update a specific field of an entry in Solr,
and it seems like the only way to do this is a delete then an add.  Is there
a better way to do this?  If I want to change one field, do I have to store
the whole entry locally, delete it from the solr index, and then add it with
the new field? That seems like a big missing feature if so!

Thanks
Zach



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Updating an entry in Solr

2013-11-13 Thread gohome190
Okay, so I've found in the solr tutorial that if you do a POST command and
post a new entry with the same uniquekey (in my case, id_) as an entry
already in the index, solr will automatically replace it for you.  That
seems to be what I need, right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLRJ API to do similar CURL command execution

2013-11-13 Thread Koji Sekiguchi

(13/11/13 22:25), Anupam Bhattacharya wrote:

How can I post the whole XML string to SOLR using its SOLRJ API ?




The source code of SimplePostTool would be of some help:

http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/SimplePostTool.html

koji
--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html


Re: Updating an entry in Solr

2013-11-13 Thread primoz . skale
Yes, that's correct. You can also update document per field but all 
fields need to be stored=true, because Solr (version = 4.0) first gets 
your document from the index, creates new document with modified field, 
and adds it again to the index...

Primoz



From:   gohome190 gohome...@gmail.com
To: solr-user@lucene.apache.org
Date:   13.11.2013 14:39
Subject:Re: Updating an entry in Solr



Okay, so I've found in the solr tutorial that if you do a POST command and
post a new entry with the same uniquekey (in my case, id_) as an entry
already in the index, solr will automatically replace it for you.  That
seems to be what I need, right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html

Sent from the Solr - User mailing list archive at Nabble.com.



RE: Data Import Handler

2013-11-13 Thread Ramesh
James can elaborate how to process driver=${dataimporter.request.driver} 
url =${dataimporter.request.url} and all where to mention these 
my purpose is to config my DB Details(url,uname,password) in properties file

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Wednesday, November 06, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

If you prepend the variable name with dataimporter.request, you can
include variables like these as request parameters:

dataSource name=ds driver=${dataimporter.request.driver}
url=${dataimporter.request.url} /

/dih?driver=some.driver.classurl=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally
add each property to solrconfig.xml like this:

requestHandler name=/dih
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=driver${dih.driver}/str
str name=url${dih.url}/str
/lst
/requestHandler

Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file





Re: Updating an entry in Solr

2013-11-13 Thread Furkan KAMACI
You should read here: http://wiki.apache.org/solr/Atomic_Updates


2013/11/13 primoz.sk...@policija.si

 Yes, that's correct. You can also update document per field but all
 fields need to be stored=true, because Solr (version = 4.0) first gets
 your document from the index, creates new document with modified field,
 and adds it again to the index...

 Primoz



 From:   gohome190 gohome...@gmail.com
 To: solr-user@lucene.apache.org
 Date:   13.11.2013 14:39
 Subject:Re: Updating an entry in Solr



 Okay, so I've found in the solr tutorial that if you do a POST command and
 post a new entry with the same uniquekey (in my case, id_) as an entry
 already in the index, solr will automatically replace it for you.  That
 seems to be what I need, right?



 --
 View this message in context:

 http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html

 Sent from the Solr - User mailing list archive at Nabble.com.




RE: Data Import Handler

2013-11-13 Thread Dyer, James
In solrcore.properties, put:

datasource.url=jdbc:xxx:yyy
datasource.driver=com.some.driver

In solrconfig.xml, put:

requestHandler name=/dih 
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
... 
str name=dsDriver${datasource.driver}/str
str name=dsUrl${datasource.url}/str
...
/lst
/requestHandler

In data-config.xml, put:
dataSource name=ds driver=${dataimporter.request.dsDriver} 
url=${dataimporter.request.dsUrl} /

Hope this works for you.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com] 
Sent: Wednesday, November 13, 2013 9:00 AM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

James can elaborate how to process driver=${dataimporter.request.driver} 
url =${dataimporter.request.url} and all where to mention these 
my purpose is to config my DB Details(url,uname,password) in properties file

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Wednesday, November 06, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

If you prepend the variable name with dataimporter.request, you can
include variables like these as request parameters:

dataSource name=ds driver=${dataimporter.request.driver}
url=${dataimporter.request.url} /

/dih?driver=some.driver.classurl=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally
add each property to solrconfig.xml like this:

requestHandler name=/dih
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=driver${dih.driver}/str
str name=url${dih.url}/str
/lst
/requestHandler

Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file







RE: Data Import Handler

2013-11-13 Thread Ramesh
Need to be put out of solr like 

customized Mysolr_core.properties
how to access it

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Wednesday, November 13, 2013 8:50 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

In solrcore.properties, put:

datasource.url=jdbc:xxx:yyy
datasource.driver=com.some.driver

In solrconfig.xml, put:

requestHandler name=/dih
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
... 
str name=dsDriver${datasource.driver}/str
str name=dsUrl${datasource.url}/str
...
/lst
/requestHandler

In data-config.xml, put:
dataSource name=ds driver=${dataimporter.request.dsDriver}
url=${dataimporter.request.dsUrl} /

Hope this works for you.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 13, 2013 9:00 AM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

James can elaborate how to process driver=${dataimporter.request.driver} 
url =${dataimporter.request.url} and all where to mention these my purpose
is to config my DB Details(url,uname,password) in properties file

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com]
Sent: Wednesday, November 06, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

If you prepend the variable name with dataimporter.request, you can
include variables like these as request parameters:

dataSource name=ds driver=${dataimporter.request.driver}
url=${dataimporter.request.url} /

/dih?driver=some.driver.classurl=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally
add each property to solrconfig.xml like this:

requestHandler name=/dih
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=driver${dih.driver}/str
str name=url${dih.url}/str
/lst
/requestHandler

Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file









Strange behavior of gap fragmenter on highlighting

2013-11-13 Thread Ing. Jorge Luis Betancourt Gonzalez
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is 
my configuration for the gap fragmenter:

  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize150/int
/lst
  /fragmenter

This is the basic configuration, just tweaked the fragsize parameter to get 
shorter fragments. The thing is that for 1 particular PDF document in my 
results I get a really long snippet, way over 150 characters. This get a little 
more odd, if I change the 150 value for 100 the snippet for the same document 
it's normal ~ 100 characters. The type of the field being highlighted is this:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
languange=Spanish/
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1 types=characters.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

Any ideas about what's happening?? Or how could I debug what is really going 
on??

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: High disk IO during UpdateCSV

2013-11-13 Thread Utkarsh Sengar
Bumping this one again, any suggestions?


On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 Hello,

 I load data from csv to solr via UpdateCSV. There are about 50M documents
 with 10 columns in each document. The index size is about 15GB and I am
 using a 3 node distributed solr cluster.

 While loading the data the disk IO goes to 100%. if the load balancer in
 front of solr hits the machine which is doing the processing then the
 request times out. But in general, requests to all the machines become
 slow. I have attached a screenshot of the diskI/O and CPU usage.

 Is there a fix in solr which can possibly throttle the load or maybe its
 due to MergePolicy? How can I debug solr to get the exact cause?

 --
 Thanks,
 -Utkarsh




-- 
Thanks,
-Utkarsh


Re: High disk IO during UpdateCSV

2013-11-13 Thread Michael Della Bitta
Utkarsh,

Your screenshot didn't come through. I don't think this list allows
attachments. Maybe put it up on imgur or something?

I'm a little unclear on whether you're using Solr in Cloud mode, or with a
single master.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Wed, Nov 13, 2013 at 11:22 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 Bumping this one again, any suggestions?


 On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com
 wrote:

  Hello,
 
  I load data from csv to solr via UpdateCSV. There are about 50M documents
  with 10 columns in each document. The index size is about 15GB and I am
  using a 3 node distributed solr cluster.
 
  While loading the data the disk IO goes to 100%. if the load balancer in
  front of solr hits the machine which is doing the processing then the
  request times out. But in general, requests to all the machines become
  slow. I have attached a screenshot of the diskI/O and CPU usage.
 
  Is there a fix in solr which can possibly throttle the load or maybe its
  due to MergePolicy? How can I debug solr to get the exact cause?
 
  --
  Thanks,
  -Utkarsh
 



 --
 Thanks,
 -Utkarsh



Re: High disk IO during UpdateCSV

2013-11-13 Thread Utkarsh Sengar
Hi Michael,

I am using solr cloud 4.5.
And update csv loads data to one of these nodes.
Attachment: http://i.imgur.com/1xmoNtt.png


Thanks,
-Utkarsh


On Wed, Nov 13, 2013 at 8:33 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Utkarsh,

 Your screenshot didn't come through. I don't think this list allows
 attachments. Maybe put it up on imgur or something?

 I'm a little unclear on whether you're using Solr in Cloud mode, or with a
 single master.

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/


 On Wed, Nov 13, 2013 at 11:22 AM, Utkarsh Sengar utkarsh2...@gmail.com
 wrote:

  Bumping this one again, any suggestions?
 
 
  On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com
  wrote:
 
   Hello,
  
   I load data from csv to solr via UpdateCSV. There are about 50M
 documents
   with 10 columns in each document. The index size is about 15GB and I am
   using a 3 node distributed solr cluster.
  
   While loading the data the disk IO goes to 100%. if the load balancer
 in
   front of solr hits the machine which is doing the processing then the
   request times out. But in general, requests to all the machines become
   slow. I have attached a screenshot of the diskI/O and CPU usage.
  
   Is there a fix in solr which can possibly throttle the load or maybe
 its
   due to MergePolicy? How can I debug solr to get the exact cause?
  
   --
   Thanks,
   -Utkarsh
  
 
 
 
  --
  Thanks,
  -Utkarsh
 




-- 
Thanks,
-Utkarsh


Re: High disk IO during UpdateCSV

2013-11-13 Thread Walter Underwood
Don't load 50M documents in one shot. Break it up into reasonable chunks 
(100K?) with commits at each point.

You will have a bottleneck somewhere, usually disk or CPU. Yours appears to be 
disk. If you get faster disks, it might become the CPU.

wunder

On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote:

 Bumping this one again, any suggestions?
 
 
 On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
 
 Hello,
 
 I load data from csv to solr via UpdateCSV. There are about 50M documents
 with 10 columns in each document. The index size is about 15GB and I am
 using a 3 node distributed solr cluster.
 
 While loading the data the disk IO goes to 100%. if the load balancer in
 front of solr hits the machine which is doing the processing then the
 request times out. But in general, requests to all the machines become
 slow. I have attached a screenshot of the diskI/O and CPU usage.
 
 Is there a fix in solr which can possibly throttle the load or maybe its
 due to MergePolicy? How can I debug solr to get the exact cause?
 
 --
 Thanks,
 -Utkarsh
 
 
 
 
 -- 
 Thanks,
 -Utkarsh

--
Walter Underwood
wun...@wunderwood.org





Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Shawn Heisey

On 11/13/2013 5:29 AM, Dmitry Kan wrote:

Reading that people have considered deploying example folder is slightly
strange to me. No wonder they are confused and confuse their ops.


I do use the stripped jetty included in the example, but my setup is not 
a straight copy of the example directory. I removed a lot of it and 
changed how jars get loaded.  I built my own init script from scratch, 
tailored for my setup.


I'll start a new thread with my init script and some info about how I 
installed Solr.


Thanks,
Shawn



Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Mark Miller
RE: the example folder

It’s something I’ve been pushing towards moving away from for a long time - see 
https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to 
'server' and pull examples into an 'examples’ directory

Part of a push I’ve been on to own the Container level (people are now on board 
with that for 5.0), add start scripts, and other niceties that we should have 
but don’t yet.

Even our config files should move away from being an “example” and end up more 
like a default starting template. Like a database, it should be simple to 
create a collection without needing to deal with config - you want to deal with 
the config when you need to, not face it all up front every time it is time to 
create a new collection.

IMO, the name example is historical - most people already use it this way, the 
name just confuses matters.

- Mark


On Nov 13, 2013, at 12:30 PM, Shawn Heisey s...@elyograg.org wrote:

 On 11/13/2013 5:29 AM, Dmitry Kan wrote:
 Reading that people have considered deploying example folder is slightly
 strange to me. No wonder they are confused and confuse their ops.
 
 I do use the stripped jetty included in the example, but my setup is not a 
 straight copy of the example directory. I removed a lot of it and changed how 
 jars get loaded.  I built my own init script from scratch, tailored for my 
 setup.
 
 I'll start a new thread with my init script and some info about how I 
 installed Solr.
 
 Thanks,
 Shawn
 



Re: My setup - init script and other info

2013-11-13 Thread Palmer, Eric
Thank you. This will help me a lot. 

Sent from my iPhone

On Nov 13, 2013, at 10:08 AM, Shawn Heisey s...@elyograg.org wrote:

 In the hopes that it will help someone get Solr running in a very clean way, 
 here's an informational email.
 
 For my Solr install on CentOS 6, I use /opt/solr4 as my installation path, 
 and /index/solr4 as my solr home.  The /index directory is a dedicated 
 filesystem, /opt is part of the root filesystem.
 
 From the example directory, I copied cloud-scripts, contexts, etc, lib, 
 webapps, and start.jar over to /opt/solr4.  My stuff was created before 
 4.3.0, so the resources directory didn't exist.  I was already using log4j 
 with a custom Solr build, and I put my log4j.properties file in etc instead.  
 I created a logs directory and a run directory in /opt/solr4.
 
 My data structure in /index/solr4 is complex.  All a new user really needs to 
 know is that solr.xml goes here and dictates the rest of the structure.  
 There is a symlink at /index/solr4/lib, pointing to /opt/solr4/solrlib - so 
 that jars placed in ${solr.solr.home}/lib are actually located in the program 
 directory, not the data directory.  That makes for a much cleaner version 
 control scenario - both directories are git repositories cloned from our 
 internal git server.
 
 Unlike the example configs, my solrconfig.xml files do not have lib 
 directives for loading jars.  That gets automatically handled by the jars 
 living in that symlinked lib directory.  See SOLR-4852 for caveats regarding 
 central lib directories.
 
 https://issues.apache.org/jira/browse/SOLR-4852
 
 If you want to run SolrCloud, you would need to install zookeeper separately 
 and put your zkHost parameter in solr.xml.  Due to a bug, putting zkHost in 
 solr.xml doesn't work properly until 4.4.0.
 
 Here's the current state of my init script.  It's redhat-specific.  I used 
 /bin/bash (instead of /bin/sh) in the shebang because I am pretty sure that 
 there are bash-isms in it, and bash is always available on the systems that I 
 use:
 
 http://apaste.info/9fVA
 
 Notable features:
 * Runs Solr as an unprivileged user.
 * Has three methods for stopping Solr, tries graceful methods first.
 1) The jetty STOPPORT/STOPKEY mechanism.
 2) PID saved by the 'start' action.
 3) Any program using the Solr listening port.
 * Before killing by PID, tries to make sure that the process actually is Solr.
 * Sets up remote JMX, by default without authentication or SSL.
 * Highly tuned CMS garbage collection.
 * Sets up GC logging.
 * Virtually everything is overridable via /etc/sysconfig/solr4.
 * Points at an overridable log4j config file, by default in /opt/solr4/etc.
 * Removes the existing PID file if the server is just booting up -- which it 
 knows by noting that server uptime is less than three minutes.
 
 It shouldn't be too hard to convert this so it works on debian-derived 
 systems.  That would involve rewriting portions that use redhat init 
 routines, and probably start-stop-daemon. What I'd really like is one script 
 that will work on any system, but that will require a fair amount of work.
 
 It's a work in progress.  It should load log4j.properties from resources 
 instead of etc. I'd like to include it in the Solr download, but without a 
 fair amount of documentation and possibly an installation script, which still 
 must be written, that won't be possible.
 
 Feel free to ask questions about anything that doesn't seem clear. I welcome 
 ideas for improvement on both my own setup and the solr example.
 
 Thanks,
 Shawn
 


Atomic Update at Solrj For a Newly Added Schema Field

2013-11-13 Thread Furkan KAMACI
I use Solr 4.5.1 I have indexed some documents and decided to add a new
field to my schema after a time later. I want to use Atomic Updates for
that newly added field. I use Solrj for indexing. However due to there is
no field named as I've newly added Solr does not make an atomic update for
existing documents. I do not want to reindex my whole data. Any ideas for
it?


My setup - init script and other info

2013-11-13 Thread Shawn Heisey
In the hopes that it will help someone get Solr running in a very clean 
way, here's an informational email.


For my Solr install on CentOS 6, I use /opt/solr4 as my installation 
path, and /index/solr4 as my solr home.  The /index directory is a 
dedicated filesystem, /opt is part of the root filesystem.


From the example directory, I copied cloud-scripts, contexts, etc, lib, 
webapps, and start.jar over to /opt/solr4.  My stuff was created before 
4.3.0, so the resources directory didn't exist.  I was already using 
log4j with a custom Solr build, and I put my log4j.properties file in 
etc instead.  I created a logs directory and a run directory in /opt/solr4.


My data structure in /index/solr4 is complex.  All a new user really 
needs to know is that solr.xml goes here and dictates the rest of the 
structure.  There is a symlink at /index/solr4/lib, pointing to 
/opt/solr4/solrlib - so that jars placed in ${solr.solr.home}/lib are 
actually located in the program directory, not the data directory.  That 
makes for a much cleaner version control scenario - both directories are 
git repositories cloned from our internal git server.


Unlike the example configs, my solrconfig.xml files do not have lib 
directives for loading jars.  That gets automatically handled by the 
jars living in that symlinked lib directory.  See SOLR-4852 for caveats 
regarding central lib directories.


https://issues.apache.org/jira/browse/SOLR-4852

If you want to run SolrCloud, you would need to install zookeeper 
separately and put your zkHost parameter in solr.xml.  Due to a bug, 
putting zkHost in solr.xml doesn't work properly until 4.4.0.


Here's the current state of my init script.  It's redhat-specific.  I 
used /bin/bash (instead of /bin/sh) in the shebang because I am pretty 
sure that there are bash-isms in it, and bash is always available on the 
systems that I use:


http://apaste.info/9fVA

Notable features:
* Runs Solr as an unprivileged user.
* Has three methods for stopping Solr, tries graceful methods first.
 1) The jetty STOPPORT/STOPKEY mechanism.
 2) PID saved by the 'start' action.
 3) Any program using the Solr listening port.
* Before killing by PID, tries to make sure that the process actually is 
Solr.

* Sets up remote JMX, by default without authentication or SSL.
* Highly tuned CMS garbage collection.
* Sets up GC logging.
* Virtually everything is overridable via /etc/sysconfig/solr4.
* Points at an overridable log4j config file, by default in /opt/solr4/etc.
* Removes the existing PID file if the server is just booting up -- 
which it knows by noting that server uptime is less than three minutes.


It shouldn't be too hard to convert this so it works on debian-derived 
systems.  That would involve rewriting portions that use redhat init 
routines, and probably start-stop-daemon. What I'd really like is one 
script that will work on any system, but that will require a fair amount 
of work.


It's a work in progress.  It should load log4j.properties from resources 
instead of etc. I'd like to include it in the Solr download, but without 
a fair amount of documentation and possibly an installation script, 
which still must be written, that won't be possible.


Feel free to ask questions about anything that doesn't seem clear. I 
welcome ideas for improvement on both my own setup and the solr example.


Thanks,
Shawn



Using data-config.xml from DIH in SolrJ

2013-11-13 Thread P Williams
Hi All,

I'm building a utility (Java jar) to create SolrInputDocuments and send
them to a HttpSolrServer using the SolrJ API.  The intention is to find an
efficient way to create documents from a large directory of files (where
multiple files make one Solr document) and be sent to a remote Solr
instance for update and commit.

I've already solved the problem using the DataImportHandler (DIH) so I have
a data-config.xml that describes the templated fields and cross-walking of
the source(s) to the schema.  The original data won't always be able to be
co-located with the Solr server which is why I'm looking for another option.

I've also already solved the problem using ant and xslt to create a
temporary (and unfortunately a potentially large) document which the
UpdateHandler will accept.  I couldn't think of a solution that took
advantage of the XSLT support in the UpdateHandler because each document is
created from multiple files.  Our current dated Java based solution
significantly outperforms this solution in terms of disk and time.  I've
rejected it based on that and gone back to the drawing board.

Does anyone have any suggestions on how I might be able to reuse my DIH
configuration in the SolrJ context without re-inventing the wheel (or DIH
in this case)?  If I'm doing something ridiculous I hope you'll point that
out too.

Thanks,
Tricia


Re: collections API error

2013-11-13 Thread Mark Miller
Try Solr 4.5.1.

https://issues.apache.org/jira/browse/SOLR-5306  Extra collection creation 
parameters like collection.configName are not being respected.

- Mark

On Nov 13, 2013, at 2:24 PM, Christopher Gross cogr...@gmail.com wrote:

 Running Apache Solr 4.5 on Tomcat 7.0.29, Java 1.6_30.  3 SolrCloud nodes
 running.  5 ZK nodes (v 3.4.5), one on each SolrCloud server, and on 2
 other servers.
 
 I want to create a collection on all 3 nodes.  I only need 1 shard.  The
 config is in Zookeeper (another collection is using it)
 
 http://solrserver:8080/solr/admin/collections?action=CREATEname=newtestnumShards=1replicationFactor=3collection.configName=test
 
 I get this error (3 times, though for a different replica #)
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
 CREATEing SolrCore 'newtest_shard1_replica2': Unable to create core:
 newtest_shard1_replica2
 
 The SolrCloud Admin logs give this as the root error:
 
 Caused by: org.apache.solr.common.cloud.ZooKeeperException: Specified
 config does not exist in ZooKeeper:newtest
 
 You can see from my call that I don't want it to be called test (already
 have one) but I want to make a new instance of the test collection.
 
 This seems  pretty straightforward -- what am I missing?  Did the
 parameters change and the wiki not get updated?
 [
 http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
 ]
 
 Thanks.
 
 -- Chris



collections API error

2013-11-13 Thread Christopher Gross
Running Apache Solr 4.5 on Tomcat 7.0.29, Java 1.6_30.  3 SolrCloud nodes
running.  5 ZK nodes (v 3.4.5), one on each SolrCloud server, and on 2
other servers.

I want to create a collection on all 3 nodes.  I only need 1 shard.  The
config is in Zookeeper (another collection is using it)

http://solrserver:8080/solr/admin/collections?action=CREATEname=newtestnumShards=1replicationFactor=3collection.configName=test

I get this error (3 times, though for a different replica #)
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'newtest_shard1_replica2': Unable to create core:
newtest_shard1_replica2

The SolrCloud Admin logs give this as the root error:

Caused by: org.apache.solr.common.cloud.ZooKeeperException: Specified
config does not exist in ZooKeeper:newtest

You can see from my call that I don't want it to be called test (already
have one) but I want to make a new instance of the test collection.

This seems  pretty straightforward -- what am I missing?  Did the
parameters change and the wiki not get updated?
 [
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
]

Thanks.

-- Chris


field collapsing performance in sharded environment

2013-11-13 Thread David Anthony Troiano
Hello,

I'm hitting a performance issue when using field collapsing in a
distributed Solr setup and I'm wondering if others have seen it and if
anyone has an idea to work around. it.

I'm using field collapsing to deduplicate documents that have the same near
duplicate hash value, and deduplicating at query time (as opposed to
filtering at index time) is a requirement.  I have a sharded setup with 10
cores (not SolrCloud), each having ~1000 documents each.  Of the 10k docs,
most have a unique near duplicate hash value, so there are about 10k unique
values for the field that I'm grouping on.  The grouping parameters that
I'm using are:

group=true
group.field=near dupe hash field
group.main=true

I'm attempting distributed queries (shards=s1,s2,...,s10) where the only
difference is the absence or presence of these three grouping parameters
and I'm consistently seeing a marked difference in performance (as a
representative data point, 200ms latency without grouping and 1600ms with
grouping).  Interestingly, if I put all 10k docs on the same core and query
that core independently with and without grouping, I don't see much of a
latency difference, so the performance degradation seems to exist only in
the sharded setup.

Is there a known performance issue when field collapsing in a sharded setup
(perhaps only manifests when the grouping field has many unique values), or
have other people observed this?  Any ideas for a workaround?  Note that
docs in my sharded setup can only have the same signature if they're in the
same shard, so perhaps that can be used to boost perf, though I don't see
an exposed way to do so.

A follow-on question is whether we're likely to see the same issue if /
when we move to SolrCloud.

Thanks,
Dave


How to escape special characters from SOLR response header

2013-11-13 Thread Developer
I am trying to escape special characters from SOLR response header (to
prevent cross site scripting).

I couldn't find any method in SolrQueryResponse to get just the SOLR
response header. 

Can someone let me know if there is a way to modify the SOLR response
header?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-escape-special-characters-from-SOLR-response-header-tp4100772.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to escape special characters from SOLR response header

2013-11-13 Thread Erik Hatcher
I'm not quite sure what you're trying to do here, can you please elaborate with 
an example?

But, you can get the response header from a SolrQueryResponse using the 
getResponseHeader() method.

Erik

On Nov 13, 2013, at 3:21 PM, Developer bbar...@gmail.com wrote:

 I am trying to escape special characters from SOLR response header (to
 prevent cross site scripting).
 
 I couldn't find any method in SolrQueryResponse to get just the SOLR
 response header. 
 
 Can someone let me know if there is a way to modify the SOLR response
 header?
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-escape-special-characters-from-SOLR-response-header-tp4100772.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: distributed search is significantly slower than direct search

2013-11-13 Thread Manuel Le Normand
It's surprising such a query takes a long time, I would assume that after
trying consistently q=*:* you should be getting cache hits and times should
be faster. Try see in the adminUI how do your query/doc cache perform.
Moreover, the query in itself is just asking the first 5000 docs that were
indexed (returing the first [docid]), so seems all this time is wasted on
transfer. Out of these 7 secs how much is spent on the above method? What
do you return by default? How big is every doc you display in your results?
Might be the matter that both collections work on the same ressources. Try
elaborating your use-case.

Anyway, it seems like you just made a test to see what will be the
performance hit in a distributed environment so I'll try to explain some
things we encountered in our benchmarks, with a case that has at least the
similarity of the num of docs fetched.

We reclaim 2000 docs every query, running over 40 shards. This means every
shard is actually transfering to our frontend 2000 docs every
document-match request (the first you were referring to). Even if lazily
loaded, reading 2000 id's (on 40 servers) and lazy loading the fields is a
tough job. Waiting for the slowest shard to respond, then sorting the docs
and reloading (lazy or not) the top 2000 docs might take a long time.

Our times are 4-8 secs, but do it's not possible comparing cases. We've
done few steps that improved it along the way, steps that led to others.
These were our starters:

   1. Profile these queries from different servers and solr instances, try
   putting your finger what collection is working hard and why. Check if
   you're stuck on components that don't have an added value for you but are
   used by default.
   2. Consider eliminating the doc cache. It loads lots of (partly) lazy
   documents that their probability of secondary usage is low. There's no such
   thing popular docs when requesting so many docs. You may be using your
   memory in a better way.
   3. Bottleneck check - inner server metrics as cpu user / iowait, packets
   transferred over the network, page faults etc. are excellent in order to
   understand if the disk/network/cpu is slowing you down. Then upgrade
   hardware in one of the shards to check if it helps by looking at the
   upgraded shard qTime compared to other.
   4. Warm up the index after commiting - try to benchmark how do queries
   performs before and after some warm-up, let's say some few hundreds of
   queries (from your previous system) in order to warm up the os cache
   (assuming your using NRTDirectoryFactory)


Good luck,
Manu


On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson erickerick...@gmail.comwrote:

 One thing you can try, and this is more diagnostic than a cure, is return
 just
 the id field (and insure that lazy field loading is true). That'll tell you
 whether
 the issue is actually fetching the document off disk and decompressing,
 although
 frankly that's unlikely since you can get your 5,000 rows from a single
 machine
 quickly.

 The code you found where Solr is spending its time, is that on the
 routing core
 or on the shards? I actually have a hard time understanding how that
 code could take a long time, doesn't seem right.

 You are transferring 5,000 docs across the network, so it's possible that
 your network is just slow, that's certainly a difference between the local
 and remote case, but that's a stab in the dark.

 Not much help I know,
 Erick



 On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir elr...@checkpoint.com wrote:

  Erick, Thanks for your response.
 
  We are upgrading our system using Solr.
  We need to preserve old functionality.  Our client displays 5K document
  and groups them.
 
  Is there a way to refactor code in order to improve distributed documents
  fetching?
 
  Thanks.
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Wednesday, October 30, 2013 3:17 AM
  To: solr-user@lucene.apache.org
  Subject: Re: distributed search is significantly slower than direct
 search
 
  You can't. There will inevitably be some overhead in the distributed
 case.
  That said, 7 seconds is quite long.
 
  5,000 rows is excessive, and probably where your issue is. You're having
  to go out and fetch the docs across the wire. Perhaps there is some
  batching that could be done there, I don't know whether this is one
  document per request or not.
 
  Why 5K docs?
 
  Best,
  Erick
 
 
  On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir elr...@checkpoint.com
 wrote:
 
   Hi all,
  
   I am using Solr 4.4 with multi cores. One core (called template) is my
   routing core.
  
   When I run
   http://127.0.0.1:8983/solr/template/select?rows=5000q=*:*shards=127.
   0.0.1:8983/solr/core1,
   it consistently takes about 7s.
   When I run http://127.0.0.1:8983/solr/core1/select?rows=5000q=*:*, it
   consistently takes about 40ms.
  
   I profiled the distributed query.
   This is the distributed query process (I hope the terms 

Re: High disk IO during UpdateCSV

2013-11-13 Thread Utkarsh Sengar
Thanks guys!
I will start splitting the file in chunks of 5M (10 chunks) to start with
reduce the size if needed.

Thanks,
-Utkarsh


On Wed, Nov 13, 2013 at 9:08 AM, Walter Underwood wun...@wunderwood.orgwrote:

 Don't load 50M documents in one shot. Break it up into reasonable chunks
 (100K?) with commits at each point.

 You will have a bottleneck somewhere, usually disk or CPU. Yours appears
 to be disk. If you get faster disks, it might become the CPU.

 wunder

 On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote:

  Bumping this one again, any suggestions?
 
 
  On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com
 wrote:
 
  Hello,
 
  I load data from csv to solr via UpdateCSV. There are about 50M
 documents
  with 10 columns in each document. The index size is about 15GB and I am
  using a 3 node distributed solr cluster.
 
  While loading the data the disk IO goes to 100%. if the load balancer in
  front of solr hits the machine which is doing the processing then the
  request times out. But in general, requests to all the machines become
  slow. I have attached a screenshot of the diskI/O and CPU usage.
 
  Is there a fix in solr which can possibly throttle the load or maybe its
  due to MergePolicy? How can I debug solr to get the exact cause?
 
  --
  Thanks,
  -Utkarsh
 
 
 
 
  --
  Thanks,
  -Utkarsh

 --
 Walter Underwood
 wun...@wunderwood.org






-- 
Thanks,
-Utkarsh


Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Robert Muir
which example? there are so many.

On Wed, Nov 13, 2013 at 1:00 PM, Mark Miller markrmil...@gmail.com wrote:
 RE: the example folder

 It’s something I’ve been pushing towards moving away from for a long time - 
 see https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to 
 'server' and pull examples into an 'examples’ directory

 Part of a push I’ve been on to own the Container level (people are now on 
 board with that for 5.0), add start scripts, and other niceties that we 
 should have but don’t yet.

 Even our config files should move away from being an “example” and end up 
 more like a default starting template. Like a database, it should be simple 
 to create a collection without needing to deal with config - you want to deal 
 with the config when you need to, not face it all up front every time it is 
 time to create a new collection.

 IMO, the name example is historical - most people already use it this way, 
 the name just confuses matters.

 - Mark


 On Nov 13, 2013, at 12:30 PM, Shawn Heisey s...@elyograg.org wrote:

 On 11/13/2013 5:29 AM, Dmitry Kan wrote:
 Reading that people have considered deploying example folder is slightly
 strange to me. No wonder they are confused and confuse their ops.

 I do use the stripped jetty included in the example, but my setup is not a 
 straight copy of the example directory. I removed a lot of it and changed 
 how jars get loaded.  I built my own init script from scratch, tailored for 
 my setup.

 I'll start a new thread with my init script and some info about how I 
 installed Solr.

 Thanks,
 Shawn




queries including time zone

2013-11-13 Thread Eric Katherman
Can anybody provide any insight about using the tz param? The behavior of this 
isn't affecting date math and /day rounding.  What format does the tz variables 
need to be in?  Not finding any documentation on this.

Sample query we're using:

path=/select 
params={tz=America/Chicagosort=id+descstart=0q=application_id:51b30ed9bc571bd96773f09c+AND+object_key:object_26+AND+values_field_215_date:[*+TO+NOW/DAY%2B1DAY]wt=jsonrows=25}

Thanks!
Eric

Re: queries including time zone

2013-11-13 Thread Jack Krupansky

I believe it is the TZ column from this table:
http://en.wikipedia.org/wiki/List_of_tz_database_time_zones

Yeah, it's on my TODO list for my book.

I suspect that tz will not affect NOW, which is probably UTC. I suspect 
that tz only affects literal dates in date math.


-- Jack Krupansky

-Original Message- 
From: Eric Katherman

Sent: Wednesday, November 13, 2013 11:38 PM
To: solr-user@lucene.apache.org
Subject: queries including time zone

Can anybody provide any insight about using the tz param? The behavior of 
this isn't affecting date math and /day rounding.  What format does the tz 
variables need to be in?  Not finding any documentation on this.


Sample query we're using:

path=/select 
params={tz=America/Chicagosort=id+descstart=0q=application_id:51b30ed9bc571bd96773f09c+AND+object_key:object_26+AND+values_field_215_date:[*+TO+NOW/DAY%2B1DAY]wt=jsonrows=25}


Thanks!
Eric= 



(info)about lucene search performents

2013-11-13 Thread Jacky.J.Wang (mis.cnsh04.Newegg) 41361
Dear lucene

I find a question that lucene search performent,first search is very slowly and 
secondary search is very quick
I use MMapDirectoryFactory in solrconfig.xml (I have already banned all solr 
cache for testing lucene search peforments )

Call mmap () is the kernel just logical addresses to physical address mapping 
table is established, and without any data to the memory mapping

Should be madvise () and mmap () match ,but MMapDirectoryFactory no madvise 
method



I find a jrra(LUCENE-3178) , I don't know if I can solve this problem







(info) about lucene search performents

2013-11-13 Thread Jacky.J.Wang (mis.cnsh04.Newegg) 41361
Dear lucene

I find a question that lucene search performent,first search is very slowly and 
secondary search is very quick
I use MMapDirectoryFactory in solrconfig.xml (I have already banned all solr 
cache for testing lucene search peforments )

Call mmap () is the kernel just logical addresses to physical address mapping 
table is established, and without any data to the memory mapping

Should be madvise () and mmap () match ,but MMapDirectoryFactory no madvise 
method



I find a jrra(LUCENE-3178) , I don't know if I can solve this problem