Re: Newbie SOLR question

2013-08-30 Thread Атанас Атанасов
Thanks for the response. Your suggestion is to keep the existing way of
indexing data where every page of a document is a row in the SOLR database,
changing the content field to be store-only and add another field (ex.
document_content) for index only where I should put the whole content of
the document. This is a good idea but I am also using HighLighter and I
think it won't work since it requires the field to be stored=true. My
problem will be solved if there is a way to search in the index-only field
where the whole document is indexed but to get the highlights/context of
the match from the existing page.
Originally my idea was to keep data in existing format (1 page - 1 record)
but somehow search in grouped (by document) results or some kind of union
between pages of a document. Is this possible?


On Thu, Aug 29, 2013 at 4:45 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Assuming you want both pages to match you need the text to be present on
 both pages. Do you actually return/store text of the page in Solr? If so,
 you can have that 'page' field store-only and have another field which is
 index-only and into which you put all your matching logic. So, that
 index-only field can contain the page plus another line/paragraph/page on
 each side.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Thu, Aug 29, 2013 at 2:49 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  So, if the match spans pages 4 and 5, what do you want returned? Page 4,
  page 5, or both?
 
  Regards,
   Alex
  On 28 Aug 2013 06:55, Атанас Атанасов atanaso...@gmail.com wrote:
 
  Hello,
 
  My name is Atanas Atanasov, I'm using SOLR 1.4/3.5/4.3 for an year and a
  half and I'm really satisfied of what it provides. Searching and
 indexing
  are extremely fast, it is easy to work with.
  However I ran into a small problem and I can't figure it out.
  I'm using SOLR to store the content/text of different types of
  documents(.pdf, .txt, .doc, etc.).
  The whole document content represents a SOLR record(all the text from
 all
  pages of the document).
  schema.xml is in SOLR_Document_Level folder of attached .zip file.
  This worked absolutely fine but I wanted to see the exact page/pages of
 a
  document where the search match is/are.
 
  I redesigned it so that every page of a document is a row in the SOLR
  database (schema.xml is in SOLR_Page_Level folder of attached .zip
 file.)
  and it works good but this resulted in the following problem:
  Example: I search for (lucene AND apache). If both words are on the same
  page I will get a hit and
  result will be returned. However If the words are on different pages of
 a
  document no results will be found.
  My goal is to find out the exact page of a document where the match is.
  Dynamic fields would solve this problem but there are very big documents
  with many pages so I don't think this is a solution.
  Can you help me with some ideas on how to make it work?
 
  Just for information. I am using SOLR as a REST service hosted in Apache
  and a .NET application to work with it.
  If you have questions please feel free to ask.
 
  Thanks in advance and Best Regards,
  Atanas Atanasov
 
 



Re :Re: [SOLR 4.4 or 4.2] indexing with dih and solrcloud

2013-08-30 Thread jerome . dupont

Hello again

Finally, I found the problem.
It seems that
_ The indexation request was done with an http GET and not with POST,
because I was lauching it from a favorite in my navigator.
Launching indexation on my documents by the admin interface made indexation
work.
_ Antoher problem was that some documents are not indexed (in particular
the firsts of the list) for some reason (due to our configuration), So when
I was trying on the ten first documents, it couldn't owrk.

Now I will try with 2 shards...

Jerome


Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 
septembre 2013 Avant d'imprimer, pensez à l'environnement. 

Re: how to delete ubb code when solr indexes documents

2013-08-30 Thread Upayavira
If you can't do it before the content gets to Solr, which would be best,
then use the ScriptUpdateProcessor and code up some Javascript to remove
it. Or, if the pattern is regular enough, you might be able to use the
RegexpUpdateProcessor.

Upayavira

On Fri, Aug 30, 2013, at 04:24 AM, vincent wrote:
 The  original document :
 
 [backcolor=#ff][color=#33][font=Tahoma] the document content  
 [/font][/color][/backcolor]
 
 I want to delete the ebb code in this document when solr index this
 document. just left :
 
 the document content. 
 
 -- 
 vincent
 


Re: apache tomcat 7 doesn't deploy tomcat 4.4

2013-08-30 Thread Carmine Paternoster
I have re-deployed all and it is ok, thank you at all for support :-)
Carmine


2013/8/30 Erick Erickson erickerick...@gmail.com

 Hmmm, I'd be glad to give you edit rights if you'd like to
 update with your current experiences, just create yourself
 a login and ping the list with your Wiki logon name and we'll
 be happy to add you to the list.

 Best
 Erick


 On Thu, Aug 29, 2013 at 12:54 PM, Jared Griffith 
 jgriff...@picsauditing.com
  wrote:

  Since we are on the topic.  I noticed that the wiki's Tomcat set up is
  horribly outdated and pretty much useless for the current Solr version.
 
 
  On Thu, Aug 29, 2013 at 9:35 AM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/29/2013 10:09 AM, Carmine Paternoster wrote:
  
   Thank you Shawn, but the logging is correctly configured, because the
  INFO
   message logging is stamped, or not? Any other suggest?
  
  
   I don't know what that INFO message means. Can you use a paste website
 (
   http://apaste.info being one example) and share exactly what's in your
   Solr log?  You mentioned log4j.properties, so this might be different
  than
   your tomcat log.
  
   Thanks,
   Shawn
  
  
 
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 



Failure to open existing log file On HDFS

2013-08-30 Thread YouPeng Yang
Hi solr user

   I'm testing the Solr with HDFS.I happend to stop my hdfs before stopping
the solr . After than I started the solr again. An exception come out[1]
that I could not ignore.
   Can anybody  explain the reason and how to avoid it?

Regards

[1]=
12326 [coreLoadExecutor-4-thread-1] ERROR org.apache.solr.update.UpdateLog
?.Failure to open existing log file (non fatal)
hdfs://lklcluster/solrdata/tlog/tlog.018:org.apache.solr.common.SolrException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.RecoveryInProgressException):
Failed to close file /solrdata/tlog/tlog.018. Lease
recovery is in progress. Try again later.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2017)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1828)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2069)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2046)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:445)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:285)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40734)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at
org.apache.solr.update.HdfsTransactionLog.init(HdfsTransactionLog.java:118)
at org.apache.solr.update.HdfsUpdateLog.init(HdfsUpdateLog.java:166)
at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:134)
at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:94)
at
org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:96)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:537)
at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:607)
at org.apache.solr.core.SolrCore.init(SolrCore.java:819)
at org.apache.solr.core.SolrCore.init(SolrCore.java:629)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:270)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.RecoveryInProgressException):
Failed to close file /solrdata/tlog/tlog.018. Lease
recovery is in progress. Try again later.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2017)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1828)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2069)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2046)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:445)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:285)
at

Solr 4.0 Functions in FL: performance?

2013-08-30 Thread Cristian Cascetta
Hello,

when I put a function in the Field List, when are field values calculated
and on wich docs?

Are values calculated over the whole set of docs? Only over the resulting
set of doc? Or, better, over the docs actually serialized in results.

i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in
my result response because i put rows=10 in my query.

I put fl=sum(fieldA,fieldB) in my query.

How many times is the sum of fieldA+fieldB executed?

thx,
c.


Re: Newbie SOLR question

2013-08-30 Thread Aloke Ghoshal
Hi,

Please refer to my response from a few months back:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3ccaht6s2az_w2av04rdmoeeck5e9o0k4ytktf0pjsecsh-lls...@mail.gmail.com%3E

Our modelling is to index N (individual pages) + 1 (original document) in
Solr. Once a document has matched for a given set of terms, the
corresponding page boundary cases can be handled by relaxing the page
search condition to an OR (you could even add these alongside with a lower
boost).

Regards,
Aloke



On Fri, Aug 30, 2013 at 12:11 PM, Атанас Атанасов atanaso...@gmail.comwrote:

 Thanks for the response. Your suggestion is to keep the existing way of
 indexing data where every page of a document is a row in the SOLR database,
 changing the content field to be store-only and add another field (ex.
 document_content) for index only where I should put the whole content of
 the document. This is a good idea but I am also using HighLighter and I
 think it won't work since it requires the field to be stored=true. My
 problem will be solved if there is a way to search in the index-only field
 where the whole document is indexed but to get the highlights/context of
 the match from the existing page.
 Originally my idea was to keep data in existing format (1 page - 1 record)
 but somehow search in grouped (by document) results or some kind of union
 between pages of a document. Is this possible?


 On Thu, Aug 29, 2013 at 4:45 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  Assuming you want both pages to match you need the text to be present on
  both pages. Do you actually return/store text of the page in Solr? If so,
  you can have that 'page' field store-only and have another field which is
  index-only and into which you put all your matching logic. So, that
  index-only field can contain the page plus another line/paragraph/page on
  each side.
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Thu, Aug 29, 2013 at 2:49 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   So, if the match spans pages 4 and 5, what do you want returned? Page
 4,
   page 5, or both?
  
   Regards,
Alex
   On 28 Aug 2013 06:55, Атанас Атанасов atanaso...@gmail.com wrote:
  
   Hello,
  
   My name is Atanas Atanasov, I'm using SOLR 1.4/3.5/4.3 for an year
 and a
   half and I'm really satisfied of what it provides. Searching and
  indexing
   are extremely fast, it is easy to work with.
   However I ran into a small problem and I can't figure it out.
   I'm using SOLR to store the content/text of different types of
   documents(.pdf, .txt, .doc, etc.).
   The whole document content represents a SOLR record(all the text from
  all
   pages of the document).
   schema.xml is in SOLR_Document_Level folder of attached .zip file.
   This worked absolutely fine but I wanted to see the exact page/pages
 of
  a
   document where the search match is/are.
  
   I redesigned it so that every page of a document is a row in the SOLR
   database (schema.xml is in SOLR_Page_Level folder of attached .zip
  file.)
   and it works good but this resulted in the following problem:
   Example: I search for (lucene AND apache). If both words are on the
 same
   page I will get a hit and
   result will be returned. However If the words are on different pages
 of
  a
   document no results will be found.
   My goal is to find out the exact page of a document where the match
 is.
   Dynamic fields would solve this problem but there are very big
 documents
   with many pages so I don't think this is a solution.
   Can you help me with some ideas on how to make it work?
  
   Just for information. I am using SOLR as a REST service hosted in
 Apache
   and a .NET application to work with it.
   If you have questions please feel free to ask.
  
   Thanks in advance and Best Regards,
   Atanas Atanasov
  
  
 



Re: Solr 4.4 problem with loading DisMaxRequestHandler

2013-08-30 Thread danielitos85
thanks a lot ;)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-4-problem-with-loading-DisMaxRequestHandler-tp4085842p4087449.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 Functions in FL: performance?

2013-08-30 Thread Andrea Gazzarini

Hi,
not actually sure I got the point but


Are values calculated over the whole set of docs? Only over the resulting

set of doc? Or, better, over the docs actually serialized in results.

The third: a function is like a virtual field computed in real-time 
associated with each (returned) doc.



i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in

my result response because i put rows=10 in my query.
I put fl=sum(fieldA,fieldB) in my query.
How many times is the sum of fieldA+fieldB executed?

10

Best,
Andrea

On 08/30/2013 09:59 AM, Cristian Cascetta wrote:

Hello,

when I put a function in the Field List, when are field values calculated
and on wich docs?

Are values calculated over the whole set of docs? Only over the resulting
set of doc? Or, better, over the docs actually serialized in results.

i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in
my result response because i put rows=10 in my query.

I put fl=sum(fieldA,fieldB) in my query.

How many times is the sum of fieldA+fieldB executed?

thx,
c.





Re: Solr 4.0 Functions in FL: performance?

2013-08-30 Thread Cristian Cascetta
Whoa that's cool!

It can simplify mani front-end calculations.  - obviously I don't want to
use it to make a simple sum :)

Thanks!

c.


2013/8/30 Andrea Gazzarini andrea.gazzar...@gmail.com

 Hi,
 not actually sure I got the point but


  Are values calculated over the whole set of docs? Only over the resulting

 set of doc? Or, better, over the docs actually serialized in results.

 The third: a function is like a virtual field computed in real-time
 associated with each (returned) doc.


  i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in

 my result response because i put rows=10 in my query.
 I put fl=sum(fieldA,fieldB) in my query.
 How many times is the sum of fieldA+fieldB executed?

 10

 Best,
 Andrea


 On 08/30/2013 09:59 AM, Cristian Cascetta wrote:

 Hello,

 when I put a function in the Field List, when are field values calculated
 and on wich docs?

 Are values calculated over the whole set of docs? Only over the resulting
 set of doc? Or, better, over the docs actually serialized in results.

 i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in
 my result response because i put rows=10 in my query.

 I put fl=sum(fieldA,fieldB) in my query.

 How many times is the sum of fieldA+fieldB executed?

 thx,
 c.





Re: SolrCloud - Path must not end with / character

2013-08-30 Thread Prasi S
Im still clueless on where the issue could be. There is no much information
in the solr logs.

i had a running version of cloud in another server. I have copied the same
to this server, and started zookeeper, then ran teh below commands,

java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig
-zkhost localhost:2181 -confdir solr-conf -confname solrconfindex

java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig
-zkhost 127.0.0.1:2181 -collection colindexer -confname solrconfindex
-solrhome ../tomcat1/solr1

After this, when i started tomcat, the first tomcat starts fine. When the
second tomcat is started, i get the above exception and it stops. Tehn the
first tomcat also shows teh same exception.




On Thu, Aug 29, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.com wrote:

 Yeah, you see this when the core could not be created. Check the logs to
 see if you can find something more useful.

 I ran into this again the other day - it's something we should fix. You
 see the same thing in the UI when a core cannot be created and it gives you
 no hint about the problem and is confusing.

 - Mark

 On Aug 29, 2013, at 5:23 AM, sathish_ix skandhasw...@inautix.co.in
 wrote:

  Hi ,
 
  Check your configuration files uploaded into zookeeper is valid and no
 error
  in config files uploaded.
  I think due to this error, solr core will not be created.
 
  Thanks,
  Sathish
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4087182.html
  Sent from the Solr - User mailing list archive at Nabble.com.




Re: SolrCloud - Path must not end with / character

2013-08-30 Thread Prasi S
Below is the script i run

START /MAX
F:\SolrCloud\zookeeper\zk-server-1\zookeeper-3.4.5\bin\zkServer.cmd


START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir
solr-conf -confname solrconf1



START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost
127.0.0.1:2182-collection firstcollection -confname solrconf1
-solrhome ../tomcat1/solr1



START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir
solr-conf -confname solrconf2




START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost
127.0.0.1:2182-collection seccollection -confname solrconf2 -solrhome
../tomcat1/solr1



START /MAX F:\solrcloud\tomcat1\bin\startup.bat



START /MAX F:\solrcloud\tomcat2\bin\startup.bat


On Fri, Aug 30, 2013 at 4:07 PM, Prasi S prasi1...@gmail.com wrote:

 Im still clueless on where the issue could be. There is no much
 information in the solr logs.

 i had a running version of cloud in another server. I have copied the same
 to this server, and started zookeeper, then ran teh below commands,

 java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig
 -zkhost localhost:2181 -confdir solr-conf -confname solrconfindex

 java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig
 -zkhost 127.0.0.1:2181 -collection colindexer -confname solrconfindex
 -solrhome ../tomcat1/solr1

 After this, when i started tomcat, the first tomcat starts fine. When the
 second tomcat is started, i get the above exception and it stops. Tehn the
 first tomcat also shows teh same exception.




 On Thu, Aug 29, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.comwrote:

 Yeah, you see this when the core could not be created. Check the logs to
 see if you can find something more useful.

 I ran into this again the other day - it's something we should fix. You
 see the same thing in the UI when a core cannot be created and it gives you
 no hint about the problem and is confusing.

 - Mark

 On Aug 29, 2013, at 5:23 AM, sathish_ix skandhasw...@inautix.co.in
 wrote:

  Hi ,
 
  Check your configuration files uploaded into zookeeper is valid and no
 error
  in config files uploaded.
  I think due to this error, solr core will not be created.
 
  Thanks,
  Sathish
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4087182.html
  Sent from the Solr - User mailing list archive at Nabble.com.





Re: regex constructs allowed in queries

2013-08-30 Thread Hugh Cayless
Yeah, upon re-reading the grammar, you're quite right. backslash escapes are 
legal, but there doesn't look like there's any concept of a predefined 
character class. For whatever reason, it looks to me like Lucene is just 
implementing its own regex matching—which may make complete sense for reasons 
of optimization—and the supported syntax is minimal and a little idiosyncratic. 
I can probably work around this now that I know what's happening. Thanks!

Hugh

On Aug 29, 2013, at 14:56 , Chris Hostetter hossman_luc...@fucit.org wrote:

 
 Thanks for the full details -- being able to see exactly how the queries 
 are recieved  parsed is important for rulling out simple things like 
 client side escaping (or lack of) and server side metacharacter handling 
 in the query parser.
 
 : Some things work the way I'd expect, some clearly don't. So my question 
 : was, in the first instance Is there full regex support? Clearly, 
 : there's supposed to be, so something is wrong, or I don't know the right 
 : escape syntax.
 
 I think it depends on your definition of full ?
 
 Based on the doc for the supported syntax, it doesn't look to me like 
 there is any direct support for some any of the pre-defined character 
 classes (ie: \s, \w, etc..) or boundary matchers (ie: ^, \b, 
 etc...)
 
 https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/util/automaton/RegExp.html
 
 It looks like there are hooks in the underlying RegExp and RegexpQuery 
 classes for registering named Automotons so the ... syntax can refer 
 to named character classes that are defined at runtime, but none are 
 registered for you by default, and there is no way to configure that with 
 the QueryParser.
 
 I'm sorry, but I don't understand the RegExp automoton stuff enough to 
 understand why the predefined character classes from java.regex.Pattern 
 aren't supported by default, or even how you could conceptually implement 
 the boundary matchers using the underlying java API.
 
 I suspect the existing regex query support isn't going to work for what 
 you're trying to do.
 
 
 -Hoss



Re: Newbie SOLR question

2013-08-30 Thread Атанас Атанасов
Thanks a lot!

I don't know how I missed this discussion.
Thank you again!

Best regards
Atanas


On Fri, Aug 30, 2013 at 11:31 AM, Aloke Ghoshal alghos...@gmail.com wrote:

 Hi,

 Please refer to my response from a few months back:

 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3ccaht6s2az_w2av04rdmoeeck5e9o0k4ytktf0pjsecsh-lls...@mail.gmail.com%3E

 Our modelling is to index N (individual pages) + 1 (original document) in
 Solr. Once a document has matched for a given set of terms, the
 corresponding page boundary cases can be handled by relaxing the page
 search condition to an OR (you could even add these alongside with a lower
 boost).

 Regards,
 Aloke



 On Fri, Aug 30, 2013 at 12:11 PM, Атанас Атанасов atanaso...@gmail.com
 wrote:

  Thanks for the response. Your suggestion is to keep the existing way of
  indexing data where every page of a document is a row in the SOLR
 database,
  changing the content field to be store-only and add another field (ex.
  document_content) for index only where I should put the whole content
 of
  the document. This is a good idea but I am also using HighLighter and I
  think it won't work since it requires the field to be stored=true. My
  problem will be solved if there is a way to search in the index-only
 field
  where the whole document is indexed but to get the highlights/context of
  the match from the existing page.
  Originally my idea was to keep data in existing format (1 page - 1
 record)
  but somehow search in grouped (by document) results or some kind of union
  between pages of a document. Is this possible?
 
 
  On Thu, Aug 29, 2013 at 4:45 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Assuming you want both pages to match you need the text to be present
 on
   both pages. Do you actually return/store text of the page in Solr? If
 so,
   you can have that 'page' field store-only and have another field which
 is
   index-only and into which you put all your matching logic. So, that
   index-only field can contain the page plus another line/paragraph/page
 on
   each side.
  
   Regards,
  Alex.
  
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
   On Thu, Aug 29, 2013 at 2:49 PM, Alexandre Rafalovitch
   arafa...@gmail.comwrote:
  
So, if the match spans pages 4 and 5, what do you want returned? Page
  4,
page 5, or both?
   
Regards,
 Alex
On 28 Aug 2013 06:55, Атанас Атанасов atanaso...@gmail.com
 wrote:
   
Hello,
   
My name is Atanas Atanasov, I'm using SOLR 1.4/3.5/4.3 for an year
  and a
half and I'm really satisfied of what it provides. Searching and
   indexing
are extremely fast, it is easy to work with.
However I ran into a small problem and I can't figure it out.
I'm using SOLR to store the content/text of different types of
documents(.pdf, .txt, .doc, etc.).
The whole document content represents a SOLR record(all the text
 from
   all
pages of the document).
schema.xml is in SOLR_Document_Level folder of attached .zip file.
This worked absolutely fine but I wanted to see the exact page/pages
  of
   a
document where the search match is/are.
   
I redesigned it so that every page of a document is a row in the
 SOLR
database (schema.xml is in SOLR_Page_Level folder of attached .zip
   file.)
and it works good but this resulted in the following problem:
Example: I search for (lucene AND apache). If both words are on the
  same
page I will get a hit and
result will be returned. However If the words are on different pages
  of
   a
document no results will be found.
My goal is to find out the exact page of a document where the match
  is.
Dynamic fields would solve this problem but there are very big
  documents
with many pages so I don't think this is a solution.
Can you help me with some ideas on how to make it work?
   
Just for information. I am using SOLR as a REST service hosted in
  Apache
and a .NET application to work with it.
If you have questions please feel free to ask.
   
Thanks in advance and Best Regards,
Atanas Atanasov
   
   
  
 



Re: cores/shards with no leader

2013-08-30 Thread Srivatsan
I am also facing this issue . I am using solr 4.3.0.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/cores-shards-with-no-leader-tp4087323p4087478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud - Path must not end with / character

2013-08-30 Thread Prasi S
Also, this fails with the default solr 4.4 downlaoded configuration too


On Fri, Aug 30, 2013 at 4:19 PM, Prasi S prasi1...@gmail.com wrote:

 Below is the script i run

 START /MAX
 F:\SolrCloud\zookeeper\zk-server-1\zookeeper-3.4.5\bin\zkServer.cmd


 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
 org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir
 solr-conf -confname solrconf1



 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
 org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182-collection 
 firstcollection -confname solrconf1 -solrhome ../tomcat1/solr1



 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
 org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir
 solr-conf -confname solrconf2




 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
 org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182-collection 
 seccollection -confname solrconf2 -solrhome ../tomcat1/solr1



 START /MAX F:\solrcloud\tomcat1\bin\startup.bat



 START /MAX F:\solrcloud\tomcat2\bin\startup.bat


 On Fri, Aug 30, 2013 at 4:07 PM, Prasi S prasi1...@gmail.com wrote:

 Im still clueless on where the issue could be. There is no much
 information in the solr logs.

 i had a running version of cloud in another server. I have copied the
 same to this server, and started zookeeper, then ran teh below commands,

 java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig
 -zkhost localhost:2181 -confdir solr-conf -confname solrconfindex

 java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig
 -zkhost 127.0.0.1:2181 -collection colindexer -confname solrconfindex
 -solrhome ../tomcat1/solr1

 After this, when i started tomcat, the first tomcat starts fine. When the
 second tomcat is started, i get the above exception and it stops. Tehn the
 first tomcat also shows teh same exception.




 On Thu, Aug 29, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.comwrote:

 Yeah, you see this when the core could not be created. Check the logs to
 see if you can find something more useful.

 I ran into this again the other day - it's something we should fix. You
 see the same thing in the UI when a core cannot be created and it gives you
 no hint about the problem and is confusing.

 - Mark

 On Aug 29, 2013, at 5:23 AM, sathish_ix skandhasw...@inautix.co.in
 wrote:

  Hi ,
 
  Check your configuration files uploaded into zookeeper is valid and no
 error
  in config files uploaded.
  I think due to this error, solr core will not be created.
 
  Thanks,
  Sathish
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4087182.html
  Sent from the Solr - User mailing list archive at Nabble.com.






Complex group request

2013-08-30 Thread Per Steffensen

Hi

I want to do a fairly complex grouping request against Solr. Lets say 
that I have fields field1 and timestamp for all my documents.


In the request I want to provide a set of time-intervals and for each 
distinct value of field1 I want to get a count on in how many of the 
time-intervals there is at least one document where the value of 
field1 is this distinct value. Smells like grouping but with an 
advanced counting.


Example
Documents in Solr
field1 | timestamp
a| 1
a| 2
b| 1
a| 3
c| 5
a| 10
b| 12
b| 11
a| 13
d| 14

Doing a query with the following time-intervals (both ends included)
time-interval#1: 1 to 2
time-interval#2: 3 to 5
time-interval#3: 6 to 12

I would like to get the following result
field1-value | count
a  | 3
b  | 2
c  | 1
Reasons
* field1-value a: Count=3, because there is a document with field1=a and 
a timestamp between 1 to 2 (actually there are 2 such documents, but we 
only count in how many time-intervals a is present and do not consider 
how many times a is present in that interval), AND because there is a 
document with field1=a and a timestamp between 3 and 5, AND because 
there is a document with field1=a and a timestamp between 6 and 12
* field1-value b: Count=2, because there is at least one document with 
field1=b in time-interval#1 AND time-interval#3 (there is no document 
with field1=b in time-interval#2)
* field1-value c: Count=1, because there is at least one document with 
field1=c in time-interval#2 (there is no document with field1=c in 
neither time-interval#1 nor time-interval#3)
* No field1-value=d in the result-set, because d is not in at least in 
one of the time-intervals.


The query part of the request probably needs to be
* q=timestamp:([1 TO 2]) OR timestamp:([3 TO 5]) OR timestamp:([6 TO 12])
but if I just add the following to the request
* group=true
* group.field=field1
* group.limit=1 (strange that you cannot set this to 0 BTW - I am not 
interested in one of the documents)

I will get the following result
field1/group-value | count
a| 4 (because there is a total of 4 
documents with field1=a in those time-intervals)

b| 3
c| 1

1) Is it possible for me to create a request that will produce the 
result I want?

2) If yes to 1), how? What will the request look like?
3) If yes to 1), will it work in a distributed SolrCloud setup?
4) If yes to 1), will it perform?
5) If no to 1), is there a fairly simple Solr-code-change I can do in 
order to make it possible? You do not have to hand me the solution, but 
a few comments on how easy/hard it would be, and ideas on how to attack 
the challenge would be nice.


Thanks!

Regards, Per Steffensen


Re: Collection - loadOnStartup

2013-08-30 Thread Srivatsan
I started looking in reducing the time taken to load cores during cluster
restart. For initializing the core, building config file takes considerable
amount of time. In our case, schema remains same for all collections(cores)
. Hence it can be kept static, and avoid loading time. 

In the mean while i have gone through schemaless feature of solr-4.5. In
such a case, do we need to define schema.xml file or it ll automatically
assigns field types dynamically? If there is no need to define schema.xml
file, then automatically core initialization takes lesser time  rite?


Thanks in advance

Srivatsan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-loadOnStartup-tp4082531p4087510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Change the score of a document based on the *value* of a multifield using dismax

2013-08-30 Thread danielitos85
Hi guys,

I need to change the score of a document based on the value of a multifield.
I thoght that maybe I need to set boost in index-time (but I don't sure).
Now I explain you my situation:

- I'm usng solr4.4 
- I'm index data using dataimporthandler with rdbms
- my documents are a particular kind of places
- my field are places and their review and description
- my multifield is distance_place beacause each place (one place is a field)
has a lot of review or description

I'm tring with the following request but it return error can not use
FieldCache on multivalued field: distance_place

http://localhost:8983/solr/myCore/select?q={!boost b=distance_place v=$qq
defType=dismax}qq=pizzafl=*,scoreqf=text_review+text_description

I'm thinking that if I set the boost at each review/description with the
value of the distance_place in index-time it is easy, but in this case I
don't know which is the right syntax to set the boost into db-dataimport.xml
file.

Please, Any suggests? 
Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-using-dismax-tp4087503.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Change the score of a document based on the *value* of a multifield using dismax

2013-08-30 Thread Erick Erickson
bq: I need to change the score of a document based on the value of a
multifield

This is contradictory in the sense that there is no value of a
multifield. That is,
which of the entries in a multiValued field is the right one to use?
There's no
syntax for doing a boost function on the first thing in the field, or
the third
thing in the field etc.

So tell us a little more about distance_place and what you want to use it
for.
Boosting by distance has a much different syntax, see:
http://wiki.apache.org/solr/SpatialSearch#How_to_boost_closest_results

Best
Erick


On Fri, Aug 30, 2013 at 9:57 AM, danielitos85 danydany@gmail.comwrote:

 Hi guys,

 I need to change the score of a document based on the value of a
 multifield.
 I thoght that maybe I need to set boost in index-time (but I don't sure).
 Now I explain you my situation:

 - I'm usng solr4.4
 - I'm index data using dataimporthandler with rdbms
 - my documents are a particular kind of places
 - my field are places and their review and description
 - my multifield is distance_place beacause each place (one place is a
 field)
 has a lot of review or description

 I'm tring with the following request but it return error can not use
 FieldCache on multivalued field: distance_place

 http://localhost:8983/solr/myCore/select?q={!boost b=distance_place v=$qq
 defType=dismax}qq=pizzafl=*,scoreqf=text_review+text_description

 I'm thinking that if I set the boost at each review/description with the
 value of the distance_place in index-time it is easy, but in this case I
 don't know which is the right syntax to set the boost into
 db-dataimport.xml
 file.

 Please, Any suggests?
 Thanks in advance.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-using-dismax-tp4087503.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Order of fields in a search query.

2013-08-30 Thread Deepak Konidena
Does the order of fields matter in a lucene query?

For instance,

q = A  B  C

Lets say A appears in a million documents, B in 1, C in 1000.

while the results would be identical irrespective of the order in which you
AND
A, B and C, will the response times of the following queries differ in any
way?

C  B  A
A  B  C

Does Lucene/Solr pick the best query execution plan in terms of both space
and time for a given query?

-Deepak


Re: apache tomcat 7 doesn't deploy tomcat 4.4

2013-08-30 Thread Jared Griffith
I didn't get the SolrCloud to work correctly, I was getting the character
ending with / error, so I could do this for a stand alone instance.


On Thu, Aug 29, 2013 at 4:43 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, I'd be glad to give you edit rights if you'd like to
 update with your current experiences, just create yourself
 a login and ping the list with your Wiki logon name and we'll
 be happy to add you to the list.

 Best
 Erick


 On Thu, Aug 29, 2013 at 12:54 PM, Jared Griffith 
 jgriff...@picsauditing.com
  wrote:

  Since we are on the topic.  I noticed that the wiki's Tomcat set up is
  horribly outdated and pretty much useless for the current Solr version.
 
 
  On Thu, Aug 29, 2013 at 9:35 AM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/29/2013 10:09 AM, Carmine Paternoster wrote:
  
   Thank you Shawn, but the logging is correctly configured, because the
  INFO
   message logging is stamped, or not? Any other suggest?
  
  
   I don't know what that INFO message means. Can you use a paste website
 (
   http://apaste.info being one example) and share exactly what's in your
   Solr log?  You mentioned log4j.properties, so this might be different
  than
   your tomcat log.
  
   Thanks,
   Shawn
  
  
 
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 




-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC


Re: SolrCloud Set up

2013-08-30 Thread Jared Griffith
One last thing.  Is there any real benefit in running SolrCloud and
Zookeeper separate?   I am seeing some funkiness with the separation of the
two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together
as outlined in the Wiki.


On Thu, Aug 29, 2013 at 4:52 PM, Jared Griffith
jgriff...@picsauditing.comwrote:

 Cool, thanks.


 On Thu, Aug 29, 2013 at 4:19 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/29/2013 4:53 PM, Jared Griffith wrote:

 OK, so to get our initial documents in, we would use the curl / java
 upload
 calls as documented in the wiki.  Then once we get it all plugged into
 our
 application, we would use the SolrJ client and plug in zookeeper
 information there, so that the application could then update and retrieve
 data correct?


 You can use SolrJ for everything, even initially loading the documents.
  Because it's an API and not an app, you'd have to write the code.

 If you have an established procedure using curl or other tools, go ahead
 and use it.

 Thanks,
 Shawn




 --

 Jared Griffith
 Linux Administrator, PICS Auditing, LLC
 P: (949) 936-4574
 C: (909) 653-7814

 http://www.picsauditing.com

 17701 Cowan #140 | Irvine, CA | 92614

 Join PICS on LinkedIn and Twitter!

 https://twitter.com/PICSAuditingLLC




-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC


QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard
Hi,
I'm using solr 4.0 final built around Dec 2012.
I was initially told that the QEC didn't work for distributed search but
apparently it was fixed.
Anyway, I use the /elevate handler with [elevated] in the field list and I
don't get any elevated results.
elevated=false in the result block.
however, if I turn on debugQuery; the elevated result appears in the debug
section under queryBoost.
Is this the only way you can get elevated results?
Because before (and I can't remember if this was before or after I went to
4.0 Final) I would get the elevated results mixed in with the regular
results in the result block.
elevated=true was the only way to tell them apart.
I also tried forceElevation, enableElevation, exclusive but there is still
no elevated results in the result block.
What am I doing wrong?
query:
http://localhost:8080/solr/Profiles/elevate?q=gangnam+stylefl=*,[elevated]wt=xmlstart=0rows=100enableElevation=trueforceElevation=truedf=textqt=edismaxdebugQuery=true
Here's my config:
  searchComponent name=elevator class=solr.QueryElevationComponent 

str name=queryFieldTypetext_general/str
str name=config-fileelevate.xml/str
  /searchComponent

  
  requestHandler name=/elevate class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=echoParamsexplicit/str
 str name=dftext/str 
/lst
arr name=last-components
  strelevator/str
/arr
  /requestHandler
elevate.xml
elevate
 query text=gangnam style
  doc
id=https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3;
/  
 /query
 
/elevate




--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud - Path must not end with / character

2013-08-30 Thread Jared Griffith
I was getting the same errors when trying to implement SolrCloud with
Tomcat.  I eventually gave up until something came out of this thread.
This all works if you just ditch Tomcat and go with the native Jetty server.


On Fri, Aug 30, 2013 at 6:28 AM, Prasi S prasi1...@gmail.com wrote:

 Also, this fails with the default solr 4.4 downlaoded configuration too


 On Fri, Aug 30, 2013 at 4:19 PM, Prasi S prasi1...@gmail.com wrote:

  Below is the script i run
 
  START /MAX
  F:\SolrCloud\zookeeper\zk-server-1\zookeeper-3.4.5\bin\zkServer.cmd
 
 
  START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir
  solr-conf -confname solrconf1
 
 
 
  START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
  org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 
  127.0.0.1:2182-collection
 firstcollection -confname solrconf1 -solrhome ../tomcat1/solr1
 
 
 
  START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir
  solr-conf -confname solrconf2
 
 
 
 
  START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
  org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 
  127.0.0.1:2182-collection
 seccollection -confname solrconf2 -solrhome ../tomcat1/solr1
 
 
 
  START /MAX F:\solrcloud\tomcat1\bin\startup.bat
 
 
 
  START /MAX F:\solrcloud\tomcat2\bin\startup.bat
 
 
  On Fri, Aug 30, 2013 at 4:07 PM, Prasi S prasi1...@gmail.com wrote:
 
  Im still clueless on where the issue could be. There is no much
  information in the solr logs.
 
  i had a running version of cloud in another server. I have copied the
  same to this server, and started zookeeper, then ran teh below commands,
 
  java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig
  -zkhost localhost:2181 -confdir solr-conf -confname solrconfindex
 
  java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig
  -zkhost 127.0.0.1:2181 -collection colindexer -confname solrconfindex
  -solrhome ../tomcat1/solr1
 
  After this, when i started tomcat, the first tomcat starts fine. When
 the
  second tomcat is started, i get the above exception and it stops. Tehn
 the
  first tomcat also shows teh same exception.
 
 
 
 
  On Thu, Aug 29, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Yeah, you see this when the core could not be created. Check the logs
 to
  see if you can find something more useful.
 
  I ran into this again the other day - it's something we should fix. You
  see the same thing in the UI when a core cannot be created and it
 gives you
  no hint about the problem and is confusing.
 
  - Mark
 
  On Aug 29, 2013, at 5:23 AM, sathish_ix skandhasw...@inautix.co.in
  wrote:
 
   Hi ,
  
   Check your configuration files uploaded into zookeeper is valid and
 no
  error
   in config files uploaded.
   I think due to this error, solr core will not be created.
  
   Thanks,
   Sathish
  
  
  
   --
   View this message in context:
 
 http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4087182.html
   Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 




-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC


Re: Collection - loadOnStartup

2013-08-30 Thread Erick Erickson
I would _really_ advise you avoid trying this without an absolutely
compelling, _demonstrated_ need.

Unless you have a _lot_ of cores, the load time
for even 100 or so isn't horrible, so if you're noticing a
significant lag time, it's probably worth looking at why that is before
jumping into using unsupported options. If you insist on doing this,
I'll be anxiously awaiting reports of whatever problems you have.

There are tests that hammer this functionality, see TestLazyCores
and the speed that cores hums right along.

So the first thing I'd look at is if your configuration is such that you
have large transaction logs that are getting replayed at startup, see:
http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

After that I'd try to pin down why you are seeing long load times
if that's not relevant.

And you haven't told us anything about your setup, this smells like
an XY problem.

Best
Erick


On Fri, Aug 30, 2013 at 10:13 AM, Srivatsan ranjith.venkate...@gmail.comwrote:

 I started looking in reducing the time taken to load cores during cluster
 restart. For initializing the core, building config file takes considerable
 amount of time. In our case, schema remains same for all collections(cores)
 . Hence it can be kept static, and avoid loading time.

 In the mean while i have gone through schemaless feature of solr-4.5. In
 such a case, do we need to define schema.xml file or it ll automatically
 assigns field types dynamically? If there is no need to define schema.xml
 file, then automatically core initialization takes lesser time  rite?


 Thanks in advance

 Srivatsan



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Collection-loadOnStartup-tp4082531p4087510.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud Set up

2013-08-30 Thread Jason Hellman
One additional thought here:  from a paranoid risk-management perspective it's 
not a good idea to have two critical services dependent upon a single point of 
failure if the hardware fails.  Obviously risk-management is suited to taste, 
so you may feel the cost/benefit does not merit the separation.  But it's good 
to make that decision consciously…you'd hate to have to justify a failure here 
after-the-fact as something overlooked :)


On Aug 30, 2013, at 9:40 AM, Shawn Heisey s...@elyograg.org wrote:

 On 8/30/2013 9:43 AM, Jared Griffith wrote:
 One last thing.  Is there any real benefit in running SolrCloud and
 Zookeeper separate?   I am seeing some funkiness with the separation of the
 two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together
 as outlined in the Wiki.
 
 For a robust install, you want zookeeper to be a separate process.  It can 
 run on the same server as Solr, but the embedded zookeeper (-DzkRun) should 
 not be used except for dev and proof of concept work.
 
 The reason is simple.  Zookeeper is the central coordinator for SolrCloud.  
 In order for it to remain stable, it should not be restarted without good 
 reason.  If you are running zookeeper as part of Solr, then you will be 
 affecting zookeeper operation anytime you restart that instance of Solr.
 
 Making changes to your Solr setup often requires that you restart Solr.  This 
 includes upgrading Solr and changing some aspects of its configuration.  Some 
 configuration aspects can be changed with just a collection reload, but 
 others require a full application restart.
 
 Thanks,
 Shawn
 



Re: SolrCloud Set up

2013-08-30 Thread Shawn Heisey

On 8/30/2013 9:43 AM, Jared Griffith wrote:

One last thing.  Is there any real benefit in running SolrCloud and
Zookeeper separate?   I am seeing some funkiness with the separation of the
two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together
as outlined in the Wiki.


For a robust install, you want zookeeper to be a separate process.  It 
can run on the same server as Solr, but the embedded zookeeper (-DzkRun) 
should not be used except for dev and proof of concept work.


The reason is simple.  Zookeeper is the central coordinator for 
SolrCloud.  In order for it to remain stable, it should not be restarted 
without good reason.  If you are running zookeeper as part of Solr, then 
you will be affecting zookeeper operation anytime you restart that 
instance of Solr.


Making changes to your Solr setup often requires that you restart Solr. 
 This includes upgrading Solr and changing some aspects of its 
configuration.  Some configuration aspects can be changed with just a 
collection reload, but others require a full application restart.


Thanks,
Shawn



Re: QueryElevationComponent results only show up with debug = true

2013-08-30 Thread Chris Hostetter
: What am I doing wrong?
: query:
: 
http://localhost:8080/solr/Profiles/elevate?q=gangnam+stylefl=*,[elevated]wt=xmlstart=0rows=100enableElevation=trueforceElevation=truedf=textqt=edismaxdebugQuery=true

You've shown up the configs you are using, and the query you executed, and 
you've said that you see evidence of the QueryElevationComponent working 
according to the debug score explanation -- but you haven't shown us the 
actaul response of your sample query so we can understand what exactly you 
are getting in response.

w/o that it's hard to understand what you are seeing and make any guesses 
as to what might be going wrong.

Can you please show us the response of that example query -- and if that 
example response doesn't include the doc from your elevate.xml, can you 
change the exampl (increase rows or whatever) so that it does so we can 
see evidence that it's really in your index and not getting elevated?

: Here's my config:
:   searchComponent name=elevator class=solr.QueryElevationComponent 
: 
: str name=queryFieldTypetext_general/str
: str name=config-fileelevate.xml/str
:   /searchComponent
: 
:   
:   requestHandler name=/elevate class=solr.SearchHandler startup=lazy
: lst name=defaults
:   str name=echoParamsexplicit/str
:  str name=dftext/str 
: /lst
: arr name=last-components
:   strelevator/str
: /arr
:   /requestHandler
: elevate.xml
: elevate
:  query text=gangnam style
:   doc
: 
id=https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3;
: /
:  /query
:  
: /elevate
: 
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss


Re: SolrCloud Set up

2013-08-30 Thread Jared Griffith
That's what I was thinking.  Though I am seeing some funkiness that I
wasn't seeing with Solr  Zookeeper running together.


On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote:

 On 8/30/2013 9:43 AM, Jared Griffith wrote:

 One last thing.  Is there any real benefit in running SolrCloud and
 Zookeeper separate?   I am seeing some funkiness with the separation of
 the
 two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together
 as outlined in the Wiki.


 For a robust install, you want zookeeper to be a separate process.  It can
 run on the same server as Solr, but the embedded zookeeper (-DzkRun) should
 not be used except for dev and proof of concept work.

 The reason is simple.  Zookeeper is the central coordinator for SolrCloud.
  In order for it to remain stable, it should not be restarted without good
 reason.  If you are running zookeeper as part of Solr, then you will be
 affecting zookeeper operation anytime you restart that instance of Solr.

 Making changes to your Solr setup often requires that you restart Solr.
  This includes upgrading Solr and changing some aspects of its
 configuration.  Some configuration aspects can be changed with just a
 collection reload, but others require a full application restart.

 Thanks,
 Shawn




-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC


Re: QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard
Sure,
Here are the results with the debugQuery=true; with debugging off, there are
no results.
The elevated result appears in the queryBoost section but not in the result
section:
?xml version=1.0 encoding=utf-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
  str name=enableElevationtrue/str
  str name=wtxml/str
  str name=rows100/str
  str name=fl*,[elevated]/str
  str name=dftext/str
  str name=debugQuerytrue/str
  str name=start0/str
  str name=qgangnam/str
  str name=forceElevationtrue/str
  str name=qtedismax/str
/lst
  /lst
  result name=response numFound=0 start=0/result
  lst name=debug
lst name=queryBoosting
  str name=qgangnam/str
  arr name=match
str
   
https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3/str
  /arr
/lst
str name=rawquerystringgangnam/str
str name=querystringgangnam/str
str name=parsedquery(text:gangnam
   
((id:https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3)^0.0))/no_coord/str
str name=parsedquery_toStringtext:gangnam
   
((id:https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3)^0.0)/str
lst name=explain /
str name=QParserLuceneQParser/str
lst name=timing
  double name=time0.0/double
  lst name=prepare
double name=time0.0/double
lst name=org.apache.solr.handler.component.QueryComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent

  double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.QueryElevationComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent

  double name=time0.0/double
/lst
  /lst
  lst name=process
double name=time0.0/double
lst name=org.apache.solr.handler.component.QueryComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent

  double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.QueryElevationComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent

  double name=time0.0/double
/lst
  /lst
/lst
  /lst
/response




--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087554.html
Sent from the Solr - User mailing list archive at Nabble.com.


ArrayIndexOutOfBoundsException Solr 4.4.0 - Cloud display

2013-08-30 Thread joeo
Hi!  I've been using Solr Cloud 4.3.1 with zookeeper and a several shard
setup.  When I try to use Solr Cloud 4.4.0, and bring up a 2 shard setup, it
seems to load fine without errors.  However when I go to the web interface
and click 'cloud' an exception is thrown:

43242 [qtp965223859-14] WARN  org.eclipse.jetty.servlet.ServletHandler  â
/solr/zookeeper
java.lang.ArrayIndexOutOfBoundsException: 213
at
org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:620)
at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:168)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:303)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:228)
at
org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:106)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:669)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1448)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:399)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:619)

This happens with a 2 shard or 40 shard setup.  Has anyone seen this before?
Thank you!

Joe Obernberger




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-Solr-4-4-0-Cloud-display-tp4087567.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard
I can guarantee you that the ID is unique and it exists in that index.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087565.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ArrayIndexOutOfBoundsException Solr 4.4.0 - Cloud display

2013-08-30 Thread joeo
I changed my zkhost to not use the root level and now it works.  I went from
using:
-DzkHost=host1.domain.com:2181,host2.domain.com:2181,host3.domain.com:2181

to

-DzkHost=host1.domain.com:2181,host2.domain.com:2181,host3.domain.com:2181/s440
which places all the zookeeper values from Solr into the 'subdirectory'
s440.  Now the cloud display works!

Joe Obernberger



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-Solr-4-4-0-Cloud-display-tp4087567p4087578.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ArrayIndexOutOfBoundsException Solr 4.4.0 - Cloud display

2013-08-30 Thread joeo
Thank you for the reply.  Is there any work around available?  Could it be
related to the number of external zookeeper servers?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-Solr-4-4-0-Cloud-display-tp4087567p4087573.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: QueryElevationComponent results only show up with debug = true

2013-08-30 Thread Chris Hostetter

: Here are the results with the debugQuery=true; with debugging off, there are
: no results.

I don't understand what you mean.  According to this response, with 
debugQuery=true, there are not results matching your query...

:   result name=response numFound=0 start=0/result


: The elevated result appears in the queryBoost section but not in the 
: result section:

Again: i odn't understand what you mean -- the debug output is just 
showing you that your QEC is in fact configured to boost a particular 
document for this query, and it's telling you the ID it's trying to 
boost based on your config -- that doesn't mean the *result* is in the 
queryBoost section.

As far as i can tell from this, the document you wnat to boost 
either isn't in your index at all, or doesn't match your query.

please see my previous comment...

 Can you please show us the response of that example query -- and if that 
 example response doesn't include the doc from your elevate.xml, can you 
 change the exampl (increase rows or whatever) so that it does so we can 
 see evidence that it's really in your index and not getting elevated?

...can you show us *any* query that proves a document with that ID exists 
in the index?  Ignoring QEC for a moment, can you use debugQuery=true and 
explainOther to see if/why that document doesn't seem to match your query 
at all?

-Hoss


Re: Change the score of a document based on the *value* of a multifield using dismax

2013-08-30 Thread danielitos85
Ok, agree.
I mean that I want to set a boost to each review/description (multifield) of
the Places (multifield), and this boost is the corrispective value of the
distance beetween the place and the particular kind of place that I have as
document.

Is it clear?

I Try to explain again the situation:

 - my document is a particular kind of Place (for example airport)
 - my fields are all the text (review and description) of the places around
the airport having a distance  10 km.

Now if I search for pizza solr return me the airport where there are a lot
of terms pizza into review/description doesn't consider the distance. So I
want to set a boost at each review/description based on the value of
distance.

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-tp4087503p4087563.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ArrayIndexOutOfBoundsException Solr 4.4.0 - Cloud display

2013-08-30 Thread Chris Hostetter

: I changed my zkhost to not use the root level and now it works.  I went from

Thanks for that feedback Joe -- i've updated SOLR-3852 to clarify the 
underlying cause of the problem and pointed out the possible work 
arround as you uncovered/confirmed.

I've also uploaded a patch if you'd like to test it out.

https://issues.apache.org/jira/browse/SOLR-3852


-Hoss


Re: ArrayIndexOutOfBoundsException Solr 4.4.0 - Cloud display

2013-08-30 Thread Chris Hostetter

: Hi!  I've been using Solr Cloud 4.3.1 with zookeeper and a several shard
: setup.  When I try to use Solr Cloud 4.4.0, and bring up a 2 shard setup, it
: seems to load fine without errors.  However when I go to the web interface
: and click 'cloud' an exception is thrown:

This looks like SOLR-3852 ... i'm not really sure what the goal was with 
that code in question, but i'm going to go ahead and rip it out. since 
it's unused.


-Hoss


Re: ArrayIndexOutOfBoundsException Solr 4.4.0 - Cloud display

2013-08-30 Thread joeo
Hoss - thank you very very much!  This gets me moving again.
http://lucene.472066.n3.nabble.com/file/n4087591/Cluster1.jpg 
-Joe Obernberger



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-Solr-4-4-0-Cloud-display-tp4087567p4087591.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Order of fields in a search query.

2013-08-30 Thread Chris Hostetter

: while the results would be identical irrespective of the order in which you
: AND
: A, B and C, will the response times of the following queries differ in any
: way?
: 
: C  B  A
: A  B  C

The queries won't be *cached* the same at the solr level, because the 
BooleanQuery generated by the parsers won't be 100% identical, but the 
execution of those (uncached) queries should be virtualy indential.

: Does Lucene/Solr pick the best query execution plan in terms of both space
: and time for a given query?

It's not a query execution plan so much as it is a skip ahead pattern.  
the Sub-Scorers for each of the sub-queries are looped over and each one 
is asked to identify the first doc id (X) that it matches, after or 
equal to the first doc id (Y) returned by the last sub-query consulted 
-- starting with Y=0.  And each time a new X is found, the looping 
starts again on the remaining subscorers until a match is found (or we run 
out of documents)

So regardless of what the order of clauses are in the original 
BooleanQuery, the Scorers for each clause are constantly reordered based 
on what the first document they match after the currently considered 
document is.  

-Hoss


Early Access Release #6 for Solr 4.x Deep Dive is now available for download on Lulu.com

2013-08-30 Thread Jack Krupansky
Okay, it's hot off the e-presses: my updated book Solr 4.x Deep Dive, Early 
Access Release #6 is now available for purchase and download as an e-book 
for $9.99 on Lulu.com at:


http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21120181.html

(That link says release-1, but it apparently correctly redirects to EAR 
#6.)


Summary of changes:

* Coverage of Core Admin API

Total of 56 pages of additional content since EAR#4, with two new appendices 
(solr.xml format, new and legacy.)


Please feel free to email or comment on my blog 
(http://basetechnology.blogspot.com/) for any questions or issues related to 
the book.


Thanks!

-- Jack Krupansky