Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Floyd Wu
After trying some search case and different params combination of
WordDelimeter. I wonder what is the best strategy to index string
2DA012_ISO MARK 2 and can be search by term 2DA012?

What if I just want _ to be removed both query/index time, what and how to
configure?

Floyd



2013/8/22 Floyd Wu floyd...@gmail.com

 Thank you all.
 By the way, Jack I gonna by your book. Where to buy?
 Floyd


 2013/8/22 Jack Krupansky j...@basetechnology.com

 I thought that the StandardTokenizer always split on punctuation, 

 Proving that you haven't read my book! The section on the standard
 tokenizer details the rules that the tokenizer uses (in addition to
 extensive examples.) That's what I mean by deep dive.

 -- Jack Krupansky

 -Original Message- From: Shawn Heisey
 Sent: Wednesday, August 21, 2013 10:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to avoid underscore sign indexing problem?


 On 8/21/2013 7:54 PM, Floyd Wu wrote:

 When using StandardAnalyzer to tokenize string Pacific_Rim will get

 ST
 textraw_**bytesstartendtypeposition
 pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1

 How to make this string to be tokenized to these two tokens Pacific,
 Rim?
 Set _ as stopword?
 Please kindly help on this.
 Many thanks.


 Interesting.  I thought that the StandardTokenizer always split on
 punctuation, but apparently that's not the case for the underscore
 character.

 You can always use the WordDelimeterFilter after the StandardTokenizer.

 http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
 WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

 Thanks,
 Shawn





Re: Data Import faile in solr 4.3.0

2013-08-22 Thread Montu v Boda
Thanks for suggestion

but as per us this is not the right way to re-index all the data each and
every time. we mean when we migrate the sole from older to latest version.
there is some way that solr have to provide the solutions for this because
re indexing the 50 lac document is not an easy job.

we want to know is there any way in solr to do this in easily.

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868p4086020.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Indexing Status

2013-08-22 Thread Prasi S
I am not using dih for indexing csv files. Im pushing data through solrj
code. But i want a status something like what dih gives. ie. fire a
command=status and we get the response. Is anythin like that available for
any type of file indexing which we do through api ?


On Thu, Aug 22, 2013 at 12:09 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Yes, you can invoke
 http://host:port/solr/dataimport?command=status which will return
 how many Solr docs have been added etc.

 On Wed, Aug 21, 2013 at 4:56 PM, Prasi S prasi1...@gmail.com wrote:
  Hi,
  I am using solr 4.4 to index csv files. I am using solrj for this. At
  frequent intervels my user may request for Status. I have to send get
  something like in DIH  Indexing in progress.. Added xxx documents.
 
  Is there anything like in dih, where we can fire a command=status to get
  the status of indexing for files.
 
 
  Thanks,
  Prasi



 --
 Regards,
 Shalin Shekhar Mangar.



relation between optimize and merge

2013-08-22 Thread YouPeng Yang
Hi All

   I do have some diffculty with understand the relation between the
optimize and merge
  Can anyone give some tips about the difference.

Regards


when does RAMBufferSize work when commit.

2013-08-22 Thread YouPeng Yang
Hi all
About the RAMBufferSize  and commit ,I have read the doc :
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/60544

   I can not figure out how do they make work.

  Given the settings:

 ramBufferSizeMB10/ramBufferSizeMB
 autoCommit
   maxTime${solr.autoCommit.maxDocs:1000}/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

 If the indexs docs up to 1000  and the size of these docs is below 10MB
,it will trigger an commit.

 If the size of the indexed docs reaches to 10MB while the the number is below
1000, it will not trigger an commit , however the index docs will just
be flushed
to disk,it will only commit when the number reaches to 1000?

 Are  the two scenarioes right?


Regards


SOLUTION: Clusterstate says state:recovering, but Core says I see state: null?

2013-08-22 Thread Tor Egil
Aliasing instead of swapping removed this problem!

DO NOT USE SWAP WHEN IN CLOUD MODE (solr 4.3)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clusterstate-says-state-recovering-but-Core-says-I-see-state-null-tp4084504p4086037.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCmdDistributor may not be threadsafe...

2013-08-22 Thread Tor Egil
I have been running DIH Imports (15 000 000 rows) all day and every now and
then I get some weird errors. Some examples:

A letter is replaced by an unknow character (Should have been a 'V')
285680 [Thread-20] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.127:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
undefined field: KUNDE_ETTERNA?N
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

  

938360 [Thread-59] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.186:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unexpected character 'l' (code 108) in start tag Expected a quote
 at [row,col {unknown-source}]: [1,2188]
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
...

1379931 [Thread-22] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.186:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unexpected character '0' (code 48) in start tag Expected a quote


2546924 [Thread-79] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.127:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unexpected character '0' (code 48) in content after '' (malformed start
element?).
 at [row,col {unknown-source}]: [1,6333]
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   

I'm running on jdk1.7.0_21. 4.4.0 1504776 with 3 nodes.

Seen this before?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCmdDistributor-may-not-be-threadsafe-tp4086042.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Indexing Status

2013-08-22 Thread Shalin Shekhar Mangar
You can use the /admin/mbeans handler to get all system stats. You can
find stats such as adds and cumulative_adds under the update
handler section.

http://localhost:8983/solr/collection1/admin/mbeans?stats=true

On Thu, Aug 22, 2013 at 12:35 PM, Prasi S prasi1...@gmail.com wrote:
 I am not using dih for indexing csv files. Im pushing data through solrj
 code. But i want a status something like what dih gives. ie. fire a
 command=status and we get the response. Is anythin like that available for
 any type of file indexing which we do through api ?


 On Thu, Aug 22, 2013 at 12:09 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 Yes, you can invoke
 http://host:port/solr/dataimport?command=status which will return
 how many Solr docs have been added etc.

 On Wed, Aug 21, 2013 at 4:56 PM, Prasi S prasi1...@gmail.com wrote:
  Hi,
  I am using solr 4.4 to index csv files. I am using solrj for this. At
  frequent intervels my user may request for Status. I have to send get
  something like in DIH  Indexing in progress.. Added xxx documents.
 
  Is there anything like in dih, where we can fire a command=status to get
  the status of indexing for files.
 
 
  Thanks,
  Prasi



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Data Import faile in solr 4.3.0

2013-08-22 Thread Shalin Shekhar Mangar
No one is asking you to re-index data. The Solr 3.5 index can be read
and written by a Solr 4.x installation.

On Thu, Aug 22, 2013 at 12:08 PM, Montu v Boda
montu.b...@highqsolutions.com wrote:
 Thanks for suggestion

 but as per us this is not the right way to re-index all the data each and
 every time. we mean when we migrate the sole from older to latest version.
 there is some way that solr have to provide the solutions for this because
 re indexing the 50 lac document is not an easy job.

 we want to know is there any way in solr to do this in easily.

 Thanks  Regards
 Montu v Boda



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868p4086020.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


DIH not proceeding after few millions

2013-08-22 Thread Prasi S
Hi, Im using DIH to index data to solr. Solr 4.4 version is used. Indexing
proceeds normal in the beginning.

I have some 10 data-config files.

file1 - select * from table where id between 1 and 100

file2 - select * from table where id between 100 and 300. and so
on.

Here 4 batches go normally. For the fifth batch, i ge the status from Admin
page ( Dataimport) as

*Duration: 2 hrs*.
Indexed:0 documents ; deleted:0 documents.

And indexing stops. But no documents were indexed. I use single external
zookeeper for this.

I dont see any exception in solr logs and in Zookeeper, below is the status.

INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got
user-level KeeperException when processing sessionid:0x1 40a4ce824b0005
type:create cxid:0x29a zxid:0x157d txntype:-1 reqpath:n/a Error P

Any ideas?


Re: Solr Indexing Status

2013-08-22 Thread Prasi S
Thanks much . This was useful.


On Thu, Aug 22, 2013 at 2:24 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 You can use the /admin/mbeans handler to get all system stats. You can
 find stats such as adds and cumulative_adds under the update
 handler section.

 http://localhost:8983/solr/collection1/admin/mbeans?stats=true

 On Thu, Aug 22, 2013 at 12:35 PM, Prasi S prasi1...@gmail.com wrote:
  I am not using dih for indexing csv files. Im pushing data through solrj
  code. But i want a status something like what dih gives. ie. fire a
  command=status and we get the response. Is anythin like that available
 for
  any type of file indexing which we do through api ?
 
 
  On Thu, Aug 22, 2013 at 12:09 AM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
  Yes, you can invoke
  http://host:port/solr/dataimport?command=status which will return
  how many Solr docs have been added etc.
 
  On Wed, Aug 21, 2013 at 4:56 PM, Prasi S prasi1...@gmail.com wrote:
   Hi,
   I am using solr 4.4 to index csv files. I am using solrj for this. At
   frequent intervels my user may request for Status. I have to send
 get
   something like in DIH  Indexing in progress.. Added xxx documents.
  
   Is there anything like in dih, where we can fire a command=status to
 get
   the status of indexing for files.
  
  
   Thanks,
   Prasi
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Flushing cache without restarting everything?

2013-08-22 Thread Dmitry Kan
But is it really a good benchmarking, if you flush the cache? Wouldn't you
want to benchmark against a system, that would be comparable to what is
under real (=production) load?

Dmitry


On Tue, Aug 20, 2013 at 9:39 PM, Jean-Sebastien Vachon 
jean-sebastien.vac...@wantedanalytics.com wrote:

 I just want to run benchmarks and want to have the same starting
 conditions.

  -Original Message-
  From: Walter Underwood [mailto:wun...@wunderwood.org]
  Sent: August-20-13 2:06 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Flushing cache without restarting everything?
 
  Why? What are you trying to acheive with this? --wunder
 
  On Aug 20, 2013, at 11:04 AM, Jean-Sebastien Vachon wrote:
 
   Hi All,
  
   Is there a way to flush the cache of all nodes in a Solr Cloud (by
 reloading all
  the cores, through the collection API, ...) without having to restart
 all nodes?
  
   Thanks
 
 
 
 
 
  -
  Aucun virus trouvé dans ce message.
  Analyse effectuée par AVG - www.avg.fr
  Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date:
 09/08/2013
  La Base de données des virus a expiré.



Re: Data Import faile in solr 4.3.0

2013-08-22 Thread Montu v Boda
thanks

actually the problem is that we have migrated the solr 1.4 index data to
solr 3.5 using replication feature of solr 3.5. so that what ever data we
have in solr 3.5 is of solr 1.4.

so i do not think so it is work in solr 4.x.

so please suggest your view based on my above point.

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868p4086068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-22 Thread skorrapa
Hello All,

I am also facing a similar issue. I am using Solr 4.3.
Following is the configuration I gave in schema.xml
 fieldType name=string_lower_case class=solr.TextField 
sortMissingLast=true omitNorms=true   
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer  
/fieldType  
   fieldType name=string_id_itm class=solr.TextField 
sortMissingLast=true omitNorms=true 
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
 /fieldType

My requirement is that any string I give during search should be treated as
a single string and try to find it, case insensitively.
I have got strings like first name and last name(for this am using
string_lower_case), and strings with special character '/'(for this am using
string_id_itm ).
But I am not getting results as expected. The first field type should also
accept strings with spaces and give me results but it isn't, and the second
field type doesnt work at all

e.g of field values: John Smith (for field type 1)
  MB56789/A (for field type 2)
Please help

vehovmar wrote
 Thanks a lot for both replies. Helped me a lot. It seems that
 EdgeNGramFilterFactory on query analyzer was really my problem, I'll have
 to test it a little more to be sure.
 
 
 As for the bf parameter, I thinks it's quite fine as it is, from
 documentation:
 
 the bf parameter actually takes a list of function queries separated by
 whitespace and each with an optional boost
 Example: bf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 And I'm using field function, Example Syntax: myFloatField or
 field(myFloatField)
 
 
 Thanks again to both of you guys!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086070.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facing Solr performance during query search

2013-08-22 Thread Toke Eskildsen
On Wed, 2013-08-21 at 10:09 +0200, sivaprasad wrote:
 The slave will poll for every 1hr. 

And are there normally changes?

 We have configured ~2000 facets and the machine configuration is given
 below.

I assume that you only request a subset of those facets at a time.

How much RAM does your machine have? 
How large is your index in GB?
How many documents do you have in your index?

As you are not explicitly warming your facets and since you have a lot
of them, my guess is that you're performing initializing facet calls all
the time. If the slave only has 32GB of RAM (and thus only about 10GB
for disk cache) and if your index is substantially larger than that, the
initialization will require a lot of non-cached disk access.

Try disabling the slave polling, then send 1000 queries and then re-send
the exact same 1000 queries. Are the response times satisfactory the
second time? If so, you should consider warming your facets and/or try
to come up with a solution where you don't have so many of them.

https://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/

- Toke Eskildsen, State and University Library, Denmark



Re: relation between optimize and merge

2013-08-22 Thread Jack Krupansky
optimize is an explicit request to perform a merge. Merges occur in the 
background, automatically, as needed or indicated by the parameters of the 
merge policy. An optimize is requested from outside of Solr.


-- Jack Krupansky

-Original Message- 
From: YouPeng Yang

Sent: Thursday, August 22, 2013 3:18 AM
To: solr-user@lucene.apache.org
Subject: relation between optimize and merge

Hi All

  I do have some diffculty with understand the relation between the
optimize and merge
 Can anyone give some tips about the difference.

Regards 



Re: Data Import faile in solr 4.3.0

2013-08-22 Thread Shalin Shekhar Mangar
Call optimize on your Solr 3.5 server which will write a new index
segment in v3.5 format. Such an index should be read in Solr 4.x
without any problem.

On Thu, Aug 22, 2013 at 5:00 PM, Montu v Boda
montu.b...@highqsolutions.com wrote:
 thanks

 actually the problem is that we have migrated the solr 1.4 index data to
 solr 3.5 using replication feature of solr 3.5. so that what ever data we
 have in solr 3.5 is of solr 1.4.

 so i do not think so it is work in solr 4.x.

 so please suggest your view based on my above point.

 Thanks  Regards
 Montu v Boda



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868p4086068.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


RE: Flushing cache without restarting everything?

2013-08-22 Thread Jean-Sebastien Vachon
How can you validate that the changes you just made had any impact on the 
performance of the cloud if you don't have the same starting conditions?

What we do basically is running a batch of requests to warm up the index and 
then launch the benchmark itself. That way we can measure the impact of our 
change(s). Otherwise there is absolutely no way we can be sure who is 
responsible for the gain or loss of performance.

Restarting a cloud is actually a real pain, I just want to know if there is a 
faster way to proceed.

 -Original Message-
 From: Dmitry Kan [mailto:solrexp...@gmail.com]
 Sent: August-22-13 7:26 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Flushing cache without restarting everything?
 
 But is it really a good benchmarking, if you flush the cache? Wouldn't you
 want to benchmark against a system, that would be comparable to what is
 under real (=production) load?
 
 Dmitry
 
 
 On Tue, Aug 20, 2013 at 9:39 PM, Jean-Sebastien Vachon  jean-
 sebastien.vac...@wantedanalytics.com wrote:
 
  I just want to run benchmarks and want to have the same starting
  conditions.
 
   -Original Message-
   From: Walter Underwood [mailto:wun...@wunderwood.org]
   Sent: August-20-13 2:06 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Flushing cache without restarting everything?
  
   Why? What are you trying to acheive with this? --wunder
  
   On Aug 20, 2013, at 11:04 AM, Jean-Sebastien Vachon wrote:
  
Hi All,
   
Is there a way to flush the cache of all nodes in a Solr Cloud (by
  reloading all
   the cores, through the collection API, ...) without having to
   restart
  all nodes?
   
Thanks
  
  
  
  
  
   -
   Aucun virus trouvé dans ce message.
   Analyse effectuée par AVG - www.avg.fr
   Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date:
  09/08/2013
   La Base de données des virus a expiré.
 
 
 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013
 La Base de données des virus a expiré.


Re: when does RAMBufferSize work when commit.

2013-08-22 Thread Shawn Heisey
On 8/22/2013 2:25 AM, YouPeng Yang wrote:
 Hi all
 About the RAMBufferSize  and commit ,I have read the doc :
 http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/60544
 
I can not figure out how do they make work.
 
   Given the settings:
 
  ramBufferSizeMB10/ramBufferSizeMB
  autoCommit
maxTime${solr.autoCommit.maxDocs:1000}/maxTime
openSearcherfalse/openSearcher
  /autoCommit
 
  If the indexs docs up to 1000  and the size of these docs is below 10MB
 ,it will trigger an commit.
 
  If the size of the indexed docs reaches to 10MB while the the number is below
 1000, it will not trigger an commit , however the index docs will just
 be flushed
 to disk,it will only commit when the number reaches to 1000?

Your actual config seems to have its wires crossed a little bit.  You
have the autoCommit.maxDocs value being used in a maxTime tag, not a
maxDocs tag.  You may want to adjust the variable name or the tag.

If that were a maxDocs tag instead of maxTime, your description would be
pretty much right on the money.  The space taken in the RAM buffer is
typically larger than the actual document size, but the general idea is
sound.

The default for RAMBufferSizeMB in recent Solr versions is 100.  Unless
you've got super small documents, or you are in a limited memory
situation and have a lot of cores, I would not go smaller than that.

Thanks,
Shawn



Re: Flushing cache without restarting everything?

2013-08-22 Thread Toke Eskildsen
On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
 Is there a way to flush the cache of all nodes in a Solr Cloud (by
 reloading all the cores, through the collection API, ...) without
 having to restart all nodes?

As MMapDirectory shares data with the OS disk cache, flushing of
Solr-related caches on a machine should involve

1) Shut down all Solr instances on the machine
2) Clear the OS read cache ('sudo echo 1  /proc/sys/vm/drop_caches' on
a Linux box)
3) Start the Solr instances

I do not know of any Solr-supported way to do step 2. For our
performance tests we use custom scripts to perform the steps.

- Toke Eskildsen, State and University Library, Denmark



Adding one core to an existing core?

2013-08-22 Thread Bruno Mannina

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add one 
another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno



Re: Adding one core to an existing core?

2013-08-22 Thread Bruno Mannina

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add one 
another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno







Re: Adding one core to an existing core?

2013-08-22 Thread Andrea Gazzarini
First, a core is a separate index so it is completely indipendent from 
the already existing core(s). So basically you don't need to reindex.


In order to have two cores (but the same applies for n cores): you must 
have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding to 
the instanceDir attribute). I said one or two because if the indexes 
configuration is basically the same (or something changes but is 
dynamically configured - i.e. core name) you can create two instances 
starting from the same configuration. I mean


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir* /
  core name=core1 instanceDir=*conf.dir* /
 /cores
/solr

Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the current 
core), you just need to have another conf dir with solrconfig.xml, 
schema.xml and other required files. In this case each core will have 
its own instanceDir.


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir.core0* /
  core name=core1 instanceDir=*conf.dir.core1* /
 /cores
/solr

Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add one 
another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno









How to access latitude and longitude with only LatLonType?

2013-08-22 Thread zhangquan913
Hello All,

I am currently doing a spatial query in solr. I indexed coordinates
(type=location class=solr.LatLonType), but the following query failed.
http://localhost/solr/quan/select?q=*:*stats=truestats.field=coordinatesstats.facet=townshiprows=0
It showed an error:
Field type
location{class=org.apache.solr.schema.SpatialRecursivePrefixTreeFieldType,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={distErrPct=0.025,
class=solr.SpatialRecursivePrefixTreeFieldType, maxDistErr=0.09,
units=degrees}} is not currently supported

I don't want to create duplicate indexed field latitude and longitude.
How can I use only coordinates to do this kind of stats on both latitude
and longitude?

Thanks,
Quan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-access-latitude-and-longitude-with-only-LatLonType-tp4086109.html
Sent from the Solr - User mailing list archive at Nabble.com.


dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i'm trying to index a html page and only user the div with the id=content. 
unfortunately nothing is working within the tika-entity, only the standard text 
(content) is populated. 

do i have to use copyField for test_text to get the data? 
or is there a problem with the entity-hirarchy?
or is the xpath wrong, even though i've tried it without and just using 
text?
or should i use the updateextractor?

data-config.xml:

dataConfig
dataSource type=BinFileDataSource name=data/
dataSource type=BinURLDataSource name=dataUrl/
dataSource type=URLDataSource 
baseUrl=http://127.0.0.1/tkb/internet/; name=main/
document
entity name=rec processor=XPathEntityProcessor 
url=docImportUrl.xml forEach=/docs/doc dataSource=main 
field column=title xpath=//title /
field column=id xpath=//id /
field column=file xpath=//file /
field column=path xpath=//path /
field column=url xpath=//url /
field column=Author xpath=//author /  

entity name=tika processor=TikaEntityProcessor 
url=${rec.path}${rec.file} dataSource=dataUrl 
!-- copyField source=text dest=text_test / --
field column=text_test xpath=//div[@id='content'] 
/   
/entity
/entity
/document
/dataConfig

docImporterUrl.xml:

?xml version=1.0 encoding=utf-8?
docs
doc
id5/id
authortkb/author
titleStartseite/title
descriptionblabla .../description
filehttp://localhost/tkb/internet/index.cfm/file
urlhttp://localhost/tkb/internet/index.cfm/url/url
path2http\specialConf/path2
/doc
doc
id6/id
authortkb/author
titleEigenheim/title
descriptionMachen Sie sich erste Gedanken über den Erwerb von 
Wohneigentum? Oder haben Sie bereits konkrete Pläne oder gar ein spruchreifes 
Projekt? Wir beraten Sie gerne in allen Fragen rund um den Erwerb oder Bau von 
Wohneigentum, damit Ihr Vorhaben auch in finanzieller Hinsicht 
gelingt./description

filehttp://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/file

urlhttp://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url/url
/doc
/docs

Re: dataimporter tika fields empty

2013-08-22 Thread Alexandre Rafalovitch
Can you try SOLR-4530 switch:
https://issues.apache.org/jira/browse/SOLR-4530

Specifically, setting htmlMapper=identity on the entity definition. This
will tell Tika to send full HTML rather than a seriously stripped one.

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Aug 22, 2013 at 11:02 AM, Andreas Owen a...@conx.ch wrote:

 i'm trying to index a html page and only user the div with the
 id=content. unfortunately nothing is working within the tika-entity, only
 the standard text (content) is populated.

 do i have to use copyField for test_text to get the data?
 or is there a problem with the entity-hirarchy?
 or is the xpath wrong, even though i've tried it without and just
 using text?
 or should i use the updateextractor?

 data-config.xml:

 dataConfig
 dataSource type=BinFileDataSource name=data/
 dataSource type=BinURLDataSource name=dataUrl/
 dataSource type=URLDataSource baseUrl=
 http://127.0.0.1/tkb/internet/; name=main/
 document
 entity name=rec processor=XPathEntityProcessor
 url=docImportUrl.xml forEach=/docs/doc dataSource=main
 field column=title xpath=//title /
 field column=id xpath=//id /
 field column=file xpath=//file /
 field column=path xpath=//path /
 field column=url xpath=//url /
 field column=Author xpath=//author /

 entity name=tika processor=TikaEntityProcessor
 url=${rec.path}${rec.file} dataSource=dataUrl 
 !-- copyField source=text dest=text_test /
 --
 field column=text_test
 xpath=//div[@id='content'] /
 /entity
 /entity
 /document
 /dataConfig

 docImporterUrl.xml:

 ?xml version=1.0 encoding=utf-8?
 docs
 doc
 id5/id
 authortkb/author
 titleStartseite/title
 descriptionblabla .../description
 filehttp://localhost/tkb/internet/index.cfm/file
 urlhttp://localhost/tkb/internet/index.cfm/url/url
 path2http\specialConf/path2
 /doc
 doc
 id6/id
 authortkb/author
 titleEigenheim/title
 descriptionMachen Sie sich erste Gedanken über den
 Erwerb von Wohneigentum? Oder haben Sie bereits konkrete Pläne oder gar ein
 spruchreifes Projekt? Wir beraten Sie gerne in allen Fragen rund um den
 Erwerb oder Bau von Wohneigentum, damit Ihr Vorhaben auch in finanzieller
 Hinsicht gelingt./description
 file
 http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/file
 url
 http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url/url
 /doc
 /docs


UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Shawn Heisey
I have an updateProcessor defined.  It seems to work perfectly when I 
index with SolrJ, but when I use DIH (which I do for a full index 
rebuild), it doesn't work.  This is the case with both Solr 4.4 and Solr 
4.5-SNAPSHOT, svn revision 1516342.


Here's a solrconfig.xml excerpt:

updateRequestProcessorChain name=nohtml
  !-- First pass converts entities and strips html. --
  processor class=solr.HTMLStripFieldUpdateProcessorFactory
str name=fieldNameft_text/str
str name=fieldNameft_subject/str
str name=fieldNamekeywords/str
str name=fieldNametext_preview/str
  /processor
  !-- Second pass fixes dually-encoded stuff. --
  processor class=solr.HTMLStripFieldUpdateProcessorFactory
str name=fieldNameft_text/str
str name=fieldNameft_subject/str
str name=fieldNamekeywords/str
str name=fieldNametext_preview/str
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

  requestHandler name=/update class=solr.UpdateRequestHandler
lst name=defaults
  str name=update.chainnohtml/str
/lst
  /requestHandler

If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I 
see replace value debugs, but the contents of the index are only 
changed if the update happens with SolrJ, not with DIH.


A side issue.  FieldMutatingUpdateProcessorFactory has the following 
line in it, at about line 72:


if (destVal != srcVal) {

Shouldn't this be the following?

if (destVal.equals(srcVal)) {

Thanks,
Shawn


Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Andrea Gazzarini

You should declare this

str name=update.chainnohtml/str

in the defaults section of the RequestHandler that corresponds to your 
dataimporthandler. You should have something like this:


requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler

lst name=defaults
str name=configdih-config.xml/str
str name=update.chainnohtml/str
/lst
/requestHandler

Otherwise the default update chain will be called (and your URP are not 
part of that). The solrj, behind the scenes, is a client of the /update 
request handler, that's the reason why using that you can see your URP 
working.


Best,
Gazza


On 08/22/2013 05:35 PM, Shawn Heisey wrote:
I have an updateProcessor defined.  It seems to work perfectly when I 
index with SolrJ, but when I use DIH (which I do for a full index 
rebuild), it doesn't work.  This is the case with both Solr 4.4 and 
Solr 4.5-SNAPSHOT, svn revision 1516342.


Here's a solrconfig.xml excerpt:

updateRequestProcessorChain name=nohtml
  !-- First pass converts entities and strips html. --
  processor class=solr.HTMLStripFieldUpdateProcessorFactory
str name=fieldNameft_text/str
str name=fieldNameft_subject/str
str name=fieldNamekeywords/str
str name=fieldNametext_preview/str
  /processor
  !-- Second pass fixes dually-encoded stuff. --
  processor class=solr.HTMLStripFieldUpdateProcessorFactory
str name=fieldNameft_text/str
str name=fieldNameft_subject/str
str name=fieldNamekeywords/str
str name=fieldNametext_preview/str
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

  requestHandler name=/update class=solr.UpdateRequestHandler
lst name=defaults
  str name=update.chainnohtml/str
/lst
  /requestHandler

If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I 
see replace value debugs, but the contents of the index are only 
changed if the update happens with SolrJ, not with DIH.


A side issue.  FieldMutatingUpdateProcessorFactory has the following 
line in it, at about line 72:


if (destVal != srcVal) {

Shouldn't this be the following?

if (destVal.equals(srcVal)) {

Thanks,
Shawn




Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Shawn Heisey

On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:

You should declare this

str name=update.chainnohtml/str

in the defaults section of the RequestHandler that corresponds to your
dataimporthandler. You should have something like this:

 requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
 str name=configdih-config.xml/str
 str name=update.chainnohtml/str
 /lst
 /requestHandler

Otherwise the default update chain will be called (and your URP are not
part of that). The solrj, behind the scenes, is a client of the /update
request handler, that's the reason why using that you can see your URP
working.


This results in an error parsing the config, so my cores won't start up. 
 I saw another message via google that talked about using 
update.processor instead of update.chain, so I tried that as well, with 
no luck.


Can I ask DIH to use the /update handler that I have declared already?

Thanks,
Shawn



Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Steve Rowe
You could declare your update chain as the default by adding 'default=true' 
to its declaring element:

   updateRequestProcessorChain name=nohtml default=true

and then you wouldn't need to declare it as the default update.chain in either 
of your request handlers.

On Aug 22, 2013, at 11:57 AM, Shawn Heisey s...@elyograg.org wrote:

 On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
 You should declare this
 
 str name=update.chainnohtml/str
 
 in the defaults section of the RequestHandler that corresponds to your
 dataimporthandler. You should have something like this:
 
 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
 str name=configdih-config.xml/str
 str name=update.chainnohtml/str
 /lst
 /requestHandler
 
 Otherwise the default update chain will be called (and your URP are not
 part of that). The solrj, behind the scenes, is a client of the /update
 request handler, that's the reason why using that you can see your URP
 working.
 
 This results in an error parsing the config, so my cores won't start up.  I 
 saw another message via google that talked about using update.processor 
 instead of update.chain, so I tried that as well, with no luck.
 
 Can I ask DIH to use the /update handler that I have declared already?
 
 Thanks,
 Shawn
 



Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Andrea Gazzarini
yes, yes of course, you should use your already declared request 
handler...that was just a copied and pasted example :)


I'm curious about what kind of error you gotI copied the snippet 
above from a working core (just replaced the name of the chain)


BTW: AFAIK is the update.processor that has been deprecated in favor 
of update.chain so this shouldn't be the problem.


Best,
Gazza

On 08/22/2013 05:57 PM, Shawn Heisey wrote:

On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:

You should declare this

str name=update.chainnohtml/str

in the defaults section of the RequestHandler that corresponds to your
dataimporthandler. You should have something like this:

 requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
 str name=configdih-config.xml/str
 str name=update.chainnohtml/str
 /lst
 /requestHandler

Otherwise the default update chain will be called (and your URP are not
part of that). The solrj, behind the scenes, is a client of the /update
request handler, that's the reason why using that you can see your URP
working.


This results in an error parsing the config, so my cores won't start 
up.  I saw another message via google that talked about using 
update.processor instead of update.chain, so I tried that as well, 
with no luck.


Can I ask DIH to use the /update handler that I have declared already?

Thanks,
Shawn





Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Shawn Heisey

On 8/22/2013 10:02 AM, Steve Rowe wrote:

You could declare your update chain as the default by adding 'default=true' 
to its declaring element:

updateRequestProcessorChain name=nohtml default=true

and then you wouldn't need to declare it as the default update.chain in either 
of your request handlers.


If I did this, would it only apply the HTML processor to only the fields 
that I have specified in those XML sections?  I haven't thought through 
the implications, but I think it might be OK.


Thanks,
Shawn



Re: dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i put it in the tika-entity as attribute, but it doesn't change anything. my 
bigger concern is why text_test isn't populated at all

On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote:

 Can you try SOLR-4530 switch:
 https://issues.apache.org/jira/browse/SOLR-4530
 
 Specifically, setting htmlMapper=identity on the entity definition. This
 will tell Tika to send full HTML rather than a seriously stripped one.
 
 Regards,
 Alex.
 
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Thu, Aug 22, 2013 at 11:02 AM, Andreas Owen a...@conx.ch wrote:
 
 i'm trying to index a html page and only user the div with the
 id=content. unfortunately nothing is working within the tika-entity, only
 the standard text (content) is populated.
 
do i have to use copyField for test_text to get the data?
or is there a problem with the entity-hirarchy?
or is the xpath wrong, even though i've tried it without and just
 using text?
or should i use the updateextractor?
 
 data-config.xml:
 
 dataConfig
dataSource type=BinFileDataSource name=data/
dataSource type=BinURLDataSource name=dataUrl/
dataSource type=URLDataSource baseUrl=
 http://127.0.0.1/tkb/internet/; name=main/
 document
entity name=rec processor=XPathEntityProcessor
 url=docImportUrl.xml forEach=/docs/doc dataSource=main
field column=title xpath=//title /
field column=id xpath=//id /
field column=file xpath=//file /
field column=path xpath=//path /
field column=url xpath=//url /
field column=Author xpath=//author /
 
entity name=tika processor=TikaEntityProcessor
 url=${rec.path}${rec.file} dataSource=dataUrl 
!-- copyField source=text dest=text_test /
 --
field column=text_test
 xpath=//div[@id='content'] /
/entity
/entity
 /document
 /dataConfig
 
 docImporterUrl.xml:
 
 ?xml version=1.0 encoding=utf-8?
 docs
 doc
id5/id
authortkb/author
titleStartseite/title
descriptionblabla .../description
filehttp://localhost/tkb/internet/index.cfm/file
urlhttp://localhost/tkb/internet/index.cfm/url/url
path2http\specialConf/path2
/doc
doc
id6/id
authortkb/author
titleEigenheim/title
descriptionMachen Sie sich erste Gedanken über den
 Erwerb von Wohneigentum? Oder haben Sie bereits konkrete Pläne oder gar ein
 spruchreifes Projekt? Wir beraten Sie gerne in allen Fragen rund um den
 Erwerb oder Bau von Wohneigentum, damit Ihr Vorhaben auch in finanzieller
 Hinsicht gelingt./description
file
 http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/file
url
 http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url/url
/doc
 /docs



Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Shawn Heisey

On 8/22/2013 10:06 AM, Andrea Gazzarini wrote:

yes, yes of course, you should use your already declared request
handler...that was just a copied and pasted example :)

I'm curious about what kind of error you gotI copied the snippet
above from a working core (just replaced the name of the chain)

BTW: AFAIK is the update.processor that has been deprecated in favor
of update.chain so this shouldn't be the problem.


Here's the full exception.  I use xinclude heavily in my solrconfig.xml. 
 The xinclude directives are actually almost the only thing that's in 
solrconfig.xml.


http://apaste.info/7PB0

I'm going to try setting my update processor to default as recommended 
by Steve Rowe.


Thanks,
Shawn



Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Andrea Gazzarini

Ok, found

requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler

lst name=defaults
str name=configdih-config.xml/str
str name=update.chain*nohtml***/str
/lst
/requestHandler

Of course, my mistake...when I changed the name of the chain I deleted 
the  char.

Sorry

On 08/22/2013 06:15 PM, Shawn Heisey wrote:
of update.chain so this shouldn't be the problem. 




Schema

2013-08-22 Thread Kamaljeet Kaur
Hello there,
I have installed solr and its working fine on localhost. Have indexed the
example files given along with solr-4.4.0. These are CSV or XML. Now I want
to index mysql database for django project and search the queries from user
end and also implement more features. What should I do?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-tp4086136.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema

2013-08-22 Thread SolrLover
Now use DIH to get the data from MYSQL database in to SOLR..

http://wiki.apache.org/solr/DataImportHandler

You need to define the field mapping (between my sql and SOLR document) in
data-config.xml.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-tp4086136p4086140.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Flushing cache without restarting everything?

2013-08-22 Thread Jean-Sebastien Vachon
I was afraid someone would tell me that... thanks for your input

 -Original Message-
 From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
 Sent: August-22-13 9:56 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Flushing cache without restarting everything?
 
 On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
  Is there a way to flush the cache of all nodes in a Solr Cloud (by
  reloading all the cores, through the collection API, ...) without
  having to restart all nodes?
 
 As MMapDirectory shares data with the OS disk cache, flushing of
 Solr-related caches on a machine should involve
 
 1) Shut down all Solr instances on the machine
 2) Clear the OS read cache ('sudo echo 1  /proc/sys/vm/drop_caches' on
 a Linux box)
 3) Start the Solr instances
 
 I do not know of any Solr-supported way to do step 2. For our
 performance tests we use custom scripts to perform the steps.
 
 - Toke Eskildsen, State and University Library, Denmark
 
 
 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013
 La Base de données des virus a expiré.


Solr Ref guide question

2013-08-22 Thread yriveiro
Hi all,

I think that there is some lack in solr's ref doc.

Section Running Solr says to run solr using the command:

$ java -jar start.jar

But If I do this with a fresh install, I have a stack trace like this:
http://pastebin.com/5YRRccTx

Is it this behavior as expected?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Ref-guide-question-tp4086142.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i can do it like this but then the content isn't copied to text. it's just in 
text_test

entity name=tika processor=TikaEntityProcessor 
url=${rec.path}${rec.file} dataSource=dataUrl 
field column=text name=text_test
copyField source=text_test dest=text /
/entity


On 22. Aug 2013, at 6:12 PM, Andreas Owen wrote:

 i put it in the tika-entity as attribute, but it doesn't change anything. my 
 bigger concern is why text_test isn't populated at all
 
 On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote:
 
 Can you try SOLR-4530 switch:
 https://issues.apache.org/jira/browse/SOLR-4530
 
 Specifically, setting htmlMapper=identity on the entity definition. This
 will tell Tika to send full HTML rather than a seriously stripped one.
 
 Regards,
 Alex.
 
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Thu, Aug 22, 2013 at 11:02 AM, Andreas Owen a...@conx.ch wrote:
 
 i'm trying to index a html page and only user the div with the
 id=content. unfortunately nothing is working within the tika-entity, only
 the standard text (content) is populated.
 
   do i have to use copyField for test_text to get the data?
   or is there a problem with the entity-hirarchy?
   or is the xpath wrong, even though i've tried it without and just
 using text?
   or should i use the updateextractor?
 
 data-config.xml:
 
 dataConfig
   dataSource type=BinFileDataSource name=data/
   dataSource type=BinURLDataSource name=dataUrl/
   dataSource type=URLDataSource baseUrl=
 http://127.0.0.1/tkb/internet/; name=main/
 document
   entity name=rec processor=XPathEntityProcessor
 url=docImportUrl.xml forEach=/docs/doc dataSource=main
   field column=title xpath=//title /
   field column=id xpath=//id /
   field column=file xpath=//file /
   field column=path xpath=//path /
   field column=url xpath=//url /
   field column=Author xpath=//author /
 
   entity name=tika processor=TikaEntityProcessor
 url=${rec.path}${rec.file} dataSource=dataUrl 
   !-- copyField source=text dest=text_test /
 --
   field column=text_test
 xpath=//div[@id='content'] /
   /entity
   /entity
 /document
 /dataConfig
 
 docImporterUrl.xml:
 
 ?xml version=1.0 encoding=utf-8?
 docs
 doc
   id5/id
   authortkb/author
   titleStartseite/title
   descriptionblabla .../description
   filehttp://localhost/tkb/internet/index.cfm/file
   urlhttp://localhost/tkb/internet/index.cfm/url/url
   path2http\specialConf/path2
   /doc
   doc
   id6/id
   authortkb/author
   titleEigenheim/title
   descriptionMachen Sie sich erste Gedanken über den
 Erwerb von Wohneigentum? Oder haben Sie bereits konkrete Pläne oder gar ein
 spruchreifes Projekt? Wir beraten Sie gerne in allen Fragen rund um den
 Erwerb oder Bau von Wohneigentum, damit Ihr Vorhaben auch in finanzieller
 Hinsicht gelingt./description
   file
 http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/file
   url
 http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url/url
   /doc
 /docs



Re: How to SOLR file in svn repository

2013-08-22 Thread SolrLover
I  don't think there's an SOLR- SVN connector available out of the box.

You can write a custom SOLRJ indexer program to get the necessary data from
SVN (using JAVA API) and add the data to SOLR.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-SOLR-file-in-svn-repository-tp4085904p4086144.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema

2013-08-22 Thread Kamaljeet Kaur
On Thu, Aug 22, 2013 at 10:56 PM, SolrLover [via Lucene]
ml-node+s472066n4086140...@n3.nabble.com wrote:

 Now use DIH to get the data from MYSQL database in to SOLR..

 http://wiki.apache.org/solr/DataImportHandler


These are for versions 1.3, 1.4, 3.6 or 4.0.
Why versions are mentioned there? Don't they work on solr 4.4.0?


-- 
Kamaljeet Kaur

kamalkaur188.wordpress.com
facebook.com/kaur.188




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-tp4086136p4086145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-22 Thread Erick Erickson
Your first problem is that the terms aren't getting to the field
analysis chain as a unit, if you attach debug=query to your
query and say you're searching lastName:(ogden erickson),
you'll sees something like
lastName:ogden lastName:erickson
when what you want is
lastname:ogden erickson
(note, this is the _parsed_ query, not the input string!
So try escaping the space as
lastname:ogden\ erickson

As for the second problem, _how_ is it not working at all?
You're breaking up the input into separate tokens, which you
say you don't want to do. If you really want all your names to
be treated as strings just ignoring, say, the / take a look at
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory
and use it with your first type.

Best
Erick


On Thu, Aug 22, 2013 at 7:34 AM, skorrapa korrapati.sus...@gmail.comwrote:

 Hello All,

 I am also facing a similar issue. I am using Solr 4.3.
 Following is the configuration I gave in schema.xml
  fieldType name=string_lower_case class=solr.TextField
 sortMissingLast=true omitNorms=true 
 analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType
fieldType name=string_id_itm class=solr.TextField
 sortMissingLast=true omitNorms=true
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
  /fieldType

 My requirement is that any string I give during search should be treated as
 a single string and try to find it, case insensitively.
 I have got strings like first name and last name(for this am using
 string_lower_case), and strings with special character '/'(for this am
 using
 string_id_itm ).
 But I am not getting results as expected. The first field type should also
 accept strings with spaces and give me results but it isn't, and the second
 field type doesnt work at all

 e.g of field values: John Smith (for field type 1)
   MB56789/A (for field type 2)
 Please help

 vehovmar wrote
  Thanks a lot for both replies. Helped me a lot. It seems that
  EdgeNGramFilterFactory on query analyzer was really my problem, I'll have
  to test it a little more to be sure.
 
 
  As for the bf parameter, I thinks it's quite fine as it is, from
  documentation:
 
  the bf parameter actually takes a list of function queries separated by
  whitespace and each with an optional boost
  Example: bf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
  And I'm using field function, Example Syntax: myFloatField or
  field(myFloatField)
 
 
  Thanks again to both of you guys!





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086070.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Flushing cache without restarting everything?

2013-08-22 Thread Walter Underwood
We warm the file buffers before starting Solr to avoid spending time waiting 
for disk IO. The script is something like this:

for core in core1 core2 core3
do
find /apps/solr/data/${core}/index -type f | xargs cat  /dev/null
done

It makes a big difference in the first few minutes of service. Of course, it 
helps if you have enough RAM to hold the entire index.

wunder

On Aug 22, 2013, at 10:28 AM, Jean-Sebastien Vachon wrote:

 I was afraid someone would tell me that... thanks for your input
 
 -Original Message-
 From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
 Sent: August-22-13 9:56 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Flushing cache without restarting everything?
 
 On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
 Is there a way to flush the cache of all nodes in a Solr Cloud (by
 reloading all the cores, through the collection API, ...) without
 having to restart all nodes?
 
 As MMapDirectory shares data with the OS disk cache, flushing of
 Solr-related caches on a machine should involve
 
 1) Shut down all Solr instances on the machine
 2) Clear the OS read cache ('sudo echo 1  /proc/sys/vm/drop_caches' on
 a Linux box)
 3) Start the Solr instances
 
 I do not know of any Solr-supported way to do step 2. For our
 performance tests we use custom scripts to perform the steps.
 
 - Toke Eskildsen, State and University Library, Denmark
 
 
 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013
 La Base de données des virus a expiré.

--
Walter Underwood
wun...@wunderwood.org





Re: How to SOLR file in svn repository

2013-08-22 Thread Walter Underwood
After you connect to Subversion, you'll need parsers for code, etc.

You might want to try Krugle instead, since they have already written all that 
stuff: http://krugle.org/

wunder

On Aug 22, 2013, at 10:43 AM, SolrLover wrote:

 I  don't think there's an SOLR- SVN connector available out of the box.
 
 You can write a custom SOLRJ indexer program to get the necessary data from
 SVN (using JAVA API) and add the data to SOLR.
 
 



Highlighting and proximity search

2013-08-22 Thread geran
Hello, I am dealing with an issue of highlighting and so far the other posts
that I've read have not provided a solution.

When using proximity search (coming soon~10) I get some documents with no
highlights and some documents highlight these words even when they are not
in a 10 word proximity. 

Some more configuration details are below, any help is much appreciated. We
are running solr version 4.4.0.

Full example query:

hl.fragsize=0hl.requireFieldMatch=truesort=document_date_range+deschl.fragListBui
lder=singlehl.fragmentsBuilder=coloredhl=trueversion=2.2rows=80hl.highlightMultiTerm=truedf=texthl.useFastVectorHighlighter=truestart=0q=(text:(coming+soon~10))hl.usePhraseHighligh
ter=true

Configuration of the field being queried:

 
  fragListBuilder name=single 
   class=solr.highlight.SingleFragListBuilder/

fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.PatternTokenizerFactory
pattern='([\-]{2,})|([\s\.\?\!,:;\“\”])'/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1 catenateAll=1 catenateNumbers=0 catenateWords=1
generateNumberParts=1 generateWordParts=0 preserveOriginal=1/  
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.PatternTokenizerFactory
pattern='([\-]{2,})|([\s\.\?\!,:;\“\”])'/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1 catenateAll=0 catenateNumbers=0 catenateWords=1
generateNumberParts=1 generateWordParts=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType

Configuration of highlighter in solrconfig.xml

 
  fragListBuilder name=single 
   class=solr.highlight.SingleFragListBuilder/


  
  fragmentsBuilder name=colored 
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.prelt;![CDATA[
   em style=background:yellow,em
style=background:lawngreen,
   em style=background:aquamarine,em
style=background:magenta,
   em style=background:palegreen,em
style=background:coral,
   em style=background:wheat,em style=background:khaki,
   em style=background:lime,em
style=background:deepskyblue]]/str
  str name=hl.tag.post/str
/lst
  /fragmentsBuilder



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-and-proximity-search-tp4086152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to SOLR file in svn repository

2013-08-22 Thread Lance Norskog

You need to:
1) crawl the SVN database
2) index the files
3) make a UI that fetches the original file when you click on a search 
results.


Solr only has #2. If you run a subversion web browser app, you can 
download the developer-only version of the LucidWorks product and crawl 
the SVN web viewer. This will give you #1 and #3.


Lance

On 08/21/2013 09:00 AM, jiunarayan wrote:

I have a svn respository and svn file path. How can I SOLR search content on
the svn file.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-SOLR-file-in-svn-repository-tp4085904.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Adding one core to an existing core?

2013-08-22 Thread Bruno Mannina

Thanks a lot !!!

Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent from 
the already existing core(s). So basically you don't need to reindex.


In order to have two cores (but the same applies for n cores): you 
must have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding to 
the instanceDir attribute). I said one or two because if the indexes 
configuration is basically the same (or something changes but is 
dynamically configured - i.e. core name) you can create two instances 
starting from the same configuration. I mean


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir* /
  core name=core1 instanceDir=*conf.dir* /
 /cores
/solr

Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the 
current core), you just need to have another conf dir with 
solrconfig.xml, schema.xml and other required files. In this case each 
core will have its own instanceDir.


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir.core0* /
  core name=core1 instanceDir=*conf.dir.core1* /
 /cores
/solr

Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add 
one another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno












Solr cloud hash range set to null after recovery from index corruption

2013-08-22 Thread Rikke Willer

Hi,

I have a Solr cloud set up with 12 shards with 2 replicas each, divided on 6 
servers (each server hosting 4 cores). Solr version is 4.3.1.
Due to memory errors on one machine, 3 of its 4 indexes became corrupted. I 
unloaded the cores, repaired the indexes with the Lucene CheckIndex tool, and 
added the cores again.
Afterwards the Solr cloud hash range has been set to null for the shards with 
corrupt indexes.
Could anybody point me to why this has occured, and more importantly, how to 
set the range on the shards again?
Thank you.

Best,

Rikke


Re: updating docs in solr cloud hangs

2013-08-22 Thread allrightname
Erick,

I've read over SOLR-4816 after finding your comment about the server-side
stack traces showing threads locked up over semaphores and I'm curious how
that issue cures the problem on the server-side as the patch only includes
client-side changes. Do the servers get so tied up shuffling documents
around when they're not sent to the master that they get blocked as
described? If they do get blocked due to shuffling documents around is a
client-side fix for this not more of a workaround than a true fix?

I'm entirely willing to apply this patch to all of the code I've got that
talks to my solr servers and try it out but I'm reluctant to because this
looks like a client-side fix to a server-side issue.

Thanks,
Greg



--
View this message in context: 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-tp4067388p4086160.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: updating docs in solr cloud hangs

2013-08-22 Thread Erick Erickson
Right, it's a little arcane. But the lockup is because the
various leaders send documents to each other and wait
for returns. If there are a _lot_ of incoming packets to
various leaders, it can generate the distributed deadlock.
So the shuffling you refer to is the root of the issue.

If the leaders only receive documents for the shard they're
a leader of, then they won't have to send updates to other
leaders and shouldn't hit this condition.

But you're right, this situation was encountered the first time
by SolrJ clients sending lots and lots or parallel requests,
I don't remember whether it was just one client with lots of
threads or many clients. If you're not using SolrJ, then
it won't do you much good since it's client-side only.

As far as being a true fix or not, you can look at it as
kicking the can down the road. This patch has several
advantages:
1 It should pave the way for, and move towards,
linear scalability as far as scaling up to many
many nodes when indexing from SolrJ.
2 It should improve throughput in the normal case as well.
3 Along the way it _should_ significantly lower (perhaps
remove entirely) the chance that this deadlock will occur,
again when indexing from SolrJ.

If you had a bunch of clients sending, say, posting csv files
to SolrCloud I'd guess you'd find this happening again.

So it's an improvement not a perfect cure. But if you think
it'd help

Best,
Erick


On Thu, Aug 22, 2013 at 3:23 PM, allrightname allrightn...@gmail.comwrote:

 Erick,

 I've read over SOLR-4816 after finding your comment about the server-side
 stack traces showing threads locked up over semaphores and I'm curious how
 that issue cures the problem on the server-side as the patch only includes
 client-side changes. Do the servers get so tied up shuffling documents
 around when they're not sent to the master that they get blocked as
 described? If they do get blocked due to shuffling documents around is a
 client-side fix for this not more of a workaround than a true fix?

 I'm entirely willing to apply this patch to all of the code I've got that
 talks to my solr servers and try it out but I'm reluctant to because this
 looks like a client-side fix to a server-side issue.

 Thanks,
 Greg



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-tp4067388p4086160.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Schema

2013-08-22 Thread Raymond Wiker
On Aug 22, 2013, at 19:53 , Kamaljeet Kaur kamal.kaur...@gmail.com wrote:
 On Thu, Aug 22, 2013 at 10:56 PM, SolrLover [via Lucene]
 ml-node+s472066n4086140...@n3.nabble.com wrote:
 
 Now use DIH to get the data from MYSQL database in to SOLR..
 
 http://wiki.apache.org/solr/DataImportHandler
 
 
 These are for versions 1.3, 1.4, 3.6 or 4.0.
 Why versions are mentioned there? Don't they work on solr 4.4.0?


Why don't you just try? 


Re: Schema

2013-08-22 Thread tamanjit.bin...@yahoo.co.in
Verisons mentioned in the wiki only tell you that these features are
available from that version of Solr. This will not be applicable in your
case as you are using the latest version. So everything you find in the wiki
would be available in 4.4 Solr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-tp4086136p4086163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to SOLR file in svn repository

2013-08-22 Thread Alexandre Rafalovitch
I don't think you can go into production with that. But cloudera
distribution (with Hue) might be a similar or better option.

Regards,
Alex
On 22 Aug 2013 14:38, Lance Norskog goks...@gmail.com wrote:

 You need to:
 1) crawl the SVN database
 2) index the files
 3) make a UI that fetches the original file when you click on a search
 results.

 Solr only has #2. If you run a subversion web browser app, you can
 download the developer-only version of the LucidWorks product and crawl the
 SVN web viewer. This will give you #1 and #3.

 Lance

 On 08/21/2013 09:00 AM, jiunarayan wrote:

 I have a svn respository and svn file path. How can I SOLR search content
 on
 the svn file.



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/How-to-SOLR-file-**in-svn-repository-tp4085904.**htmlhttp://lucene.472066.n3.nabble.com/How-to-SOLR-file-in-svn-repository-tp4085904.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: dataimporter tika fields empty

2013-08-22 Thread Alexandre Rafalovitch
Ah. That's because Tika processor does not support path extraction. You
need to nest one more level.

Regards,
  Alex
On 22 Aug 2013 13:34, Andreas Owen a...@conx.ch wrote:

 i can do it like this but then the content isn't copied to text. it's just
 in text_test

 entity name=tika processor=TikaEntityProcessor
 url=${rec.path}${rec.file} dataSource=dataUrl 
 field column=text name=text_test
 copyField source=text_test dest=text /
 /entity


 On 22. Aug 2013, at 6:12 PM, Andreas Owen wrote:

  i put it in the tika-entity as attribute, but it doesn't change
 anything. my bigger concern is why text_test isn't populated at all
 
  On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote:
 
  Can you try SOLR-4530 switch:
  https://issues.apache.org/jira/browse/SOLR-4530
 
  Specifically, setting htmlMapper=identity on the entity definition.
 This
  will tell Tika to send full HTML rather than a seriously stripped one.
 
  Regards,
  Alex.
 
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
  On Thu, Aug 22, 2013 at 11:02 AM, Andreas Owen a...@conx.ch wrote:
 
  i'm trying to index a html page and only user the div with the
  id=content. unfortunately nothing is working within the tika-entity,
 only
  the standard text (content) is populated.
 
do i have to use copyField for test_text to get the data?
or is there a problem with the entity-hirarchy?
or is the xpath wrong, even though i've tried it without and just
  using text?
or should i use the updateextractor?
 
  data-config.xml:
 
  dataConfig
dataSource type=BinFileDataSource name=data/
dataSource type=BinURLDataSource name=dataUrl/
dataSource type=URLDataSource baseUrl=
  http://127.0.0.1/tkb/internet/; name=main/
  document
entity name=rec processor=XPathEntityProcessor
  url=docImportUrl.xml forEach=/docs/doc dataSource=main
field column=title xpath=//title /
field column=id xpath=//id /
field column=file xpath=//file /
field column=path xpath=//path /
field column=url xpath=//url /
field column=Author xpath=//author /
 
entity name=tika processor=TikaEntityProcessor
  url=${rec.path}${rec.file} dataSource=dataUrl 
!-- copyField source=text dest=text_test /
  --
field column=text_test
  xpath=//div[@id='content'] /
/entity
/entity
  /document
  /dataConfig
 
  docImporterUrl.xml:
 
  ?xml version=1.0 encoding=utf-8?
  docs
  doc
id5/id
authortkb/author
titleStartseite/title
descriptionblabla .../description
filehttp://localhost/tkb/internet/index.cfm/file
urlhttp://localhost/tkb/internet/index.cfm/url/url
path2http\specialConf/path2
/doc
doc
id6/id
authortkb/author
titleEigenheim/title
descriptionMachen Sie sich erste Gedanken über den
  Erwerb von Wohneigentum? Oder haben Sie bereits konkrete Pläne oder
 gar ein
  spruchreifes Projekt? Wir beraten Sie gerne in allen Fragen rund um den
  Erwerb oder Bau von Wohneigentum, damit Ihr Vorhaben auch in
 finanzieller
  Hinsicht gelingt./description
file
  http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/file
url
  http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url/url
/doc
  /docs




RE: updating docs in solr cloud hangs

2013-08-22 Thread Greg Walters
Thanks, Erick that's exactly the clarification/confirmation I was looking for!

Greg



Re: Solr Ref guide question

2013-08-22 Thread Brendan Grainger
What version of solr are you using? Have you copied a solr.xml from
somewhere else? I can almost reproduce the error you're getting if I put a
non-existent core in my solr.xml, e.g.:

solr

  cores adminPath=/admin/cores
core name=core0 instanceDir=a_non_existent_core /
  /cores
...


On Thu, Aug 22, 2013 at 1:30 PM, yriveiro yago.rive...@gmail.com wrote:

 Hi all,

 I think that there is some lack in solr's ref doc.

 Section Running Solr says to run solr using the command:

 $ java -jar start.jar

 But If I do this with a fresh install, I have a stack trace like this:
 http://pastebin.com/5YRRccTx

 Is it this behavior as expected?



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Ref-guide-question-tp4086142.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Brendan Grainger
www.kuripai.com


How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
If I am using solr.SchemaSimilarityFactory to allow different similarities
for different fields, do I set discountOverlaps=true on the factory or
per field?

What is the syntax?   The below does not seem to work

similarity class=solr.BM25SimilarityFactory discountOverlaps=true  
similarity class=solr.SchemaSimilarityFactory discountOverlaps=true
 /

Tom


RE: How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-22 Thread Markus Jelsma
Hi Tom,

Don't set it as attributes but as lists as Solr uses everywhere:
similarity class=solr.SchemaSimilarityFactory
  bool name=discountOverlapstrue/bool
/similarity

For BM25 you can also set k1 and b which is very convenient!

Cheers
 
 
-Original message-
 From:Tom Burton-West tburt...@umich.edu
 Sent: Thursday 22nd August 2013 22:42
 To: solr-user@lucene.apache.org
 Subject: How to set discountOverlaps=quot;truequot; in Solr 4x schema.xml
 
 If I am using solr.SchemaSimilarityFactory to allow different similarities
 for different fields, do I set discountOverlaps=true on the factory or
 per field?
 
 What is the syntax?   The below does not seem to work
 
 similarity class=solr.BM25SimilarityFactory discountOverlaps=true  
 similarity class=solr.SchemaSimilarityFactory discountOverlaps=true
  /
 
 Tom
 


Re: How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
Thanks Markus,

I set it , but it seems to make no difference in the score or statistics
listed in the debugQuery or in the ranking.   I'm using a field with
CommonGrams and a huge list of common words, so there should be a huge
difference in the document length with and without discountOverlaps.

Is the default for Solr 4 true?

 similarity class=solr.BM25SimilarityFactory  
  float name=k11.2/float
  float name=b0.75/float
bool name=discountOverlapsfalse/bool
  /similarity



On Thu, Aug 22, 2013 at 4:58 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Hi Tom,

 Don't set it as attributes but as lists as Solr uses everywhere:
 similarity class=solr.SchemaSimilarityFactory
   bool name=discountOverlapstrue/bool
 /similarity

 For BM25 you can also set k1 and b which is very convenient!

 Cheers


 -Original message-
  From:Tom Burton-West tburt...@umich.edu
  Sent: Thursday 22nd August 2013 22:42
  To: solr-user@lucene.apache.org
  Subject: How to set discountOverlaps=quot;truequot; in Solr 4x
 schema.xml
 
  If I am using solr.SchemaSimilarityFactory to allow different
 similarities
  for different fields, do I set discountOverlaps=true on the factory or
  per field?
 
  What is the syntax?   The below does not seem to work
 
  similarity class=solr.BM25SimilarityFactory discountOverlaps=true  
  similarity class=solr.SchemaSimilarityFactory discountOverlaps=true
   /
 
  Tom
 



Re: How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
I should have said that I have set it both to true and to false and
restarted Solr each time and the rankings and info in the debug query
showed no change.

Does this have to be set at index time?

Tom






Storing query results

2013-08-22 Thread jfeist
I am in the process of setting up a search application that allows the user
to view paginated query results.  The documents are highly dynamic but I
want the search results to be static, i.e. I don't want the user to click
the next page button, the query reruns, and now he has a different set of
search results because the data changed while he was looking through it.  I
want the results stored somewhere else and the successive page queries to
draw from that.  I know Solr has query result caching, but I want to store
it entirely.  Does Solr provide any functionality like this?  I imagine it
doesn't, because then you'd need to specify how long to store it, etc.  I'm
using Solr 4.4.0.  I found someone asking something similar  here
http://lucene.472066.n3.nabble.com/storing-results-td476351.html   but
that was 6 years ago.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-query-results-tp4086182.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR Prevent solr of modifying fields when update doc

2013-08-22 Thread Luís Portela Afonso
Hi,

How can i prevent solr from update some fields when updating a doc?
The problem is, i have an uuid with the field name uuid, but it is not an 
unique key. When a rss source updates a feed, solr will update the doc with the 
same link but it generates a new uuid. This is not the desired because this id 
is used by me to relate feeds with an user.

Can someone help me?

Many Thanks

smime.p7s
Description: S/MIME cryptographic signature


Re: Storing query results

2013-08-22 Thread Ahmet Arslan
Hi jfeist,

Your mail reminds me this blog, not sure about solr though.

http://blog.mikemccandless.com/2011/11/searcherlifetimemanager-prevents-broken.html




 From: jfeist jfe...@llminc.com
To: solr-user@lucene.apache.org 
Sent: Friday, August 23, 2013 12:09 AM
Subject: Storing query results
 

I am in the process of setting up a search application that allows the user
to view paginated query results.  The documents are highly dynamic but I
want the search results to be static, i.e. I don't want the user to click
the next page button, the query reruns, and now he has a different set of
search results because the data changed while he was looking through it.  I
want the results stored somewhere else and the successive page queries to
draw from that.  I know Solr has query result caching, but I want to store
it entirely.  Does Solr provide any functionality like this?  I imagine it
doesn't, because then you'd need to specify how long to store it, etc.  I'm
using Solr 4.4.0.  I found someone asking something similar  here
http://lucene.472066.n3.nabble.com/storing-results-td476351.html   but
that was 6 years ago.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-query-results-tp4086182.html
Sent from the Solr - User mailing list archive at Nabble.com.

SOLR search by external fields

2013-08-22 Thread Vinay B,
What we need is similar to what is discussed here, except not as a filter
but as an actual query:
http://lucene.472066.n3.nabble.com/filter-query-from-external-list-of-Solr-unique-IDs-td1709060.html

We'd like to implement a query parser/scorer that would allow us to combine
SOLR searches with searching external fields. This is due to the limitation
of having to update an entire document even though only a field in the
document needs to be updated.

For example we have a database table called document_attributes containing
two columns document_id, attribute_id. The document_id corresponds to the
ID of the documents indexed is SOLR.

We'd like to be able to pass in a query like:

attribute_id:123 OR text:some_query
(attribute_id:123 OR attribute_id:456) AND text:some_query
etc...

Can we implement a plugin/module in SOLR that's able to parse the above
query and then fetch the document_ids associated with the attribute_id and
combine the results with the normal processing of SOLR search to return one
set of results for the entire query.

We'd appreciate any guidance on how to implement this if it is possible.


custom names for replicas in solrcloud

2013-08-22 Thread smanad
Hi, 

I am using Solr 4.3 with 3 solr hosta and with an external zookeeper
ensemble of 3 servers. And just 1 shard currently.

When I create collections using collections api it creates collections with
names, 
collection1_shard1_replica1, collection1_shard1_replica2,
collection1_shard1_replica3.
Is there any way to pass a custom name? or can I have all the replicas with
same name?

Any pointers will be much appreciated. 
Thanks, 
-Manasi 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Measuring SOLR performance

2013-08-22 Thread Roman Chyla
Hi Dmitry,
So it seems solrjmeter should not assume the adminPath - and perhaps needs
to be passed as an argument. When you set the adminPath, are you able to
access localhost:8983/solr/statements/admin/cores ?

roman


On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi Roman,

 I have noticed a difference with different solr.xml config contents. It is
 probably legit, but thought to let you know (tests run on fresh checkout as
 of today).

 As mentioned before, I have two cores configured in solr.xml. If the file
 is:

 [code]
 solr persistent=false

   !--
   adminPath: RequestHandler path to manage cores.
 If 'null' (or absent), cores will not be manageable via request handler
   --
   cores adminPath=/admin/cores host=${host:}
 hostPort=${jetty.port:8983} hostContext=${hostContext:solr}
 core name=metadata instanceDir=metadata /
 core name=statements instanceDir=statements /
   /cores
 /solr
 [/code]

 then the instruction:

 python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
 ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R
 cms -t /solr/statements -e statements -U 100

 works just fine. If however the solr.xml has adminPath set to /admin
 solrjmeter produces an error:

 [error]
 **ERROR**
   File solrjmeter.py, line 1386, in module
 main(sys.argv)
   File solrjmeter.py, line 1278, in main
 check_prerequisities(options)
   File solrjmeter.py, line 375, in check_prerequisities
 error('Cannot find admin pages: %s, please report a bug' % apath)
   File solrjmeter.py, line 66, in error
 traceback.print_stack()
 Cannot find admin pages: http://localhost:8983/solr/admin, please report a
 bug
 [/error]

 With both solr.xml configs the following url returns just fine:

 http://localhost:8983/solr/statements/admin/system?wt=json

 Regards,

 Dmitry



 On Wed, Aug 14, 2013 at 2:03 PM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi Roman,
 
  This looks much better, thanks! The ordinary non-comarison mode works.
  I'll post here, if there are other findings.
 
  Thanks for quick turnarounds,
 
  Dmitry
 
 
  On Wed, Aug 14, 2013 at 1:32 AM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
  Hi Dmitry, oh yes, late night fixes... :) The latest commit should make
 it
  work for you.
  Thanks!
 
  roman
 
 
  On Tue, Aug 13, 2013 at 3:37 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
 
   Hi Roman,
  
   Something bad happened in fresh checkout:
  
   python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
   ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs
 60
  -R
   cms -t /solr/statements -e statements -U 100
  
   Traceback (most recent call last):
 File solrjmeter.py, line 1392, in module
   main(sys.argv)
 File solrjmeter.py, line 1347, in main
   save_into_file('before-test.json', simplejson.dumps(before_test))
 File /usr/lib/python2.7/dist-packages/simplejson/__init__.py, line
  286,
   in dumps
   return _default_encoder.encode(obj)
 File /usr/lib/python2.7/dist-packages/simplejson/encoder.py, line
  226,
   in encode
   chunks = self.iterencode(o, _one_shot=True)
 File /usr/lib/python2.7/dist-packages/simplejson/encoder.py, line
  296,
   in iterencode
   return _iterencode(o, 0)
 File /usr/lib/python2.7/dist-packages/simplejson/encoder.py, line
  202,
   in default
   raise TypeError(repr(o) +  is not JSON serializable)
   TypeError: __main__.ForgivingValue object at 0x7fc6d4040fd0 is not
  JSON
   serializable
  
  
   Regards,
  
   D.
  
  
   On Tue, Aug 13, 2013 at 8:10 AM, Roman Chyla roman.ch...@gmail.com
   wrote:
  
Hi Dmitry,
   
   
   
On Mon, Aug 12, 2013 at 9:36 AM, Dmitry Kan solrexp...@gmail.com
   wrote:
   
 Hi Roman,

 Good point. I managed to run the command with -C and double
 quotes:

 python solrjmeter.py -a -C g1,cms -c hour -x
  ./jmx/SolrQueryTest.jmx

 As a result got several files (html, css, js, csv) in the running
directory
 (any way to specify where the output should be stored in this
 case?)

   
i know it is confusing, i plan to change it - but later, now it is
 too
   busy
here...
   
   

 When I look onto the comparison dashboard, I see this:

 http://pbrd.co/17IRI0b

   
two things: the tests probably took more than one hour to finish, so
  they
are not aligned - try generating the comparison with '-c  14400'
  (ie.
4x3600 secs)
   
the other thing: if you have only two datapoints, the dygraph will
 not
   show
anything - there must be more datapoints/measurements
   
   
   

 One more thing: all the previous tests were run with softCommit
   disabled.
 After enabling it, the tests started to fail:

 $ python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
 ./queries/demo/demo.queries -s localhost -p 8983 -a
  --durationInSecs 60
-R
 g1 -t /solr/statements -e statements -U 100
 $ cd g1
 Reading 

Re: Flushing cache without restarting everything?

2013-08-22 Thread Dan Davis
be careful with drop_caches - make sure you sync first


On Thu, Aug 22, 2013 at 1:28 PM, Jean-Sebastien Vachon 
jean-sebastien.vac...@wantedanalytics.com wrote:

 I was afraid someone would tell me that... thanks for your input

  -Original Message-
  From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
  Sent: August-22-13 9:56 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Flushing cache without restarting everything?
 
  On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
   Is there a way to flush the cache of all nodes in a Solr Cloud (by
   reloading all the cores, through the collection API, ...) without
   having to restart all nodes?
 
  As MMapDirectory shares data with the OS disk cache, flushing of
  Solr-related caches on a machine should involve
 
  1) Shut down all Solr instances on the machine
  2) Clear the OS read cache ('sudo echo 1  /proc/sys/vm/drop_caches' on
  a Linux box)
  3) Start the Solr instances
 
  I do not know of any Solr-supported way to do step 2. For our
  performance tests we use custom scripts to perform the steps.
 
  - Toke Eskildsen, State and University Library, Denmark
 
 
  -
  Aucun virus trouvé dans ce message.
  Analyse effectuée par AVG - www.avg.fr
  Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date:
 09/08/2013
  La Base de données des virus a expiré.



Removing duplicates during a query

2013-08-22 Thread Dan Davis
Suppose I have two documents with different id, and there is another field,
for instance content-hash which is something like a 16-byte hash of the
content.

Can Solr be configured to return just one copy, and drop the other if both
are relevant?

If Solr does drop one result, do you get any indication in the document
that was kept that there was another copy?


Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Floyd Wu
Alright, thanks for all your help. I finally fix this problem using
PatternReplaceFilterFactory + WordDelimeterfilterFactory.

I first replace _ (underscore) using PatternReplaceFilterFactory and then
using WordDelimeterFilterFactory to generate word and number part to
increase user search hit. Although this decrease search quality a little,
but user need higher recall rate than precision.

Thank you all.

Floyd





2013/8/22 Floyd Wu floyd...@gmail.com

 After trying some search case and different params combination of
 WordDelimeter. I wonder what is the best strategy to index string
 2DA012_ISO MARK 2 and can be search by term 2DA012?

 What if I just want _ to be removed both query/index time, what and how to
 configure?

 Floyd



 2013/8/22 Floyd Wu floyd...@gmail.com

 Thank you all.
 By the way, Jack I gonna by your book. Where to buy?
 Floyd


 2013/8/22 Jack Krupansky j...@basetechnology.com

 I thought that the StandardTokenizer always split on punctuation, 

 Proving that you haven't read my book! The section on the standard
 tokenizer details the rules that the tokenizer uses (in addition to
 extensive examples.) That's what I mean by deep dive.

 -- Jack Krupansky

 -Original Message- From: Shawn Heisey
 Sent: Wednesday, August 21, 2013 10:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to avoid underscore sign indexing problem?


 On 8/21/2013 7:54 PM, Floyd Wu wrote:

 When using StandardAnalyzer to tokenize string Pacific_Rim will get

 ST
 textraw_**bytesstartendtypeposition
 pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1

 How to make this string to be tokenized to these two tokens Pacific,
 Rim?
 Set _ as stopword?
 Please kindly help on this.
 Many thanks.


 Interesting.  I thought that the StandardTokenizer always split on
 punctuation, but apparently that's not the case for the underscore
 character.

 You can always use the WordDelimeterFilter after the StandardTokenizer.

 http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
 WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

 Thanks,
 Shawn






Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Dan Davis
Ah, but what is the definition of punctuation in Solr?


On Wed, Aug 21, 2013 at 11:15 PM, Jack Krupansky j...@basetechnology.comwrote:

 I thought that the StandardTokenizer always split on punctuation, 

 Proving that you haven't read my book! The section on the standard
 tokenizer details the rules that the tokenizer uses (in addition to
 extensive examples.) That's what I mean by deep dive.

 -- Jack Krupansky

 -Original Message- From: Shawn Heisey
 Sent: Wednesday, August 21, 2013 10:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to avoid underscore sign indexing problem?


 On 8/21/2013 7:54 PM, Floyd Wu wrote:

 When using StandardAnalyzer to tokenize string Pacific_Rim will get

 ST
 textraw_**bytesstartendtypeposition
 pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1

 How to make this string to be tokenized to these two tokens Pacific,
 Rim?
 Set _ as stopword?
 Please kindly help on this.
 Many thanks.


 Interesting.  I thought that the StandardTokenizer always split on
 punctuation, but apparently that's not the case for the underscore
 character.

 You can always use the WordDelimeterFilter after the StandardTokenizer.

 http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
 WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

 Thanks,
 Shawn



Re: More on topic of Meta-search/Federated Search with Solr

2013-08-22 Thread Dan Davis
You are right, but here's my null hypothesis for studying the impact on
relevance.Hash the query to deterministically seed random number
generator.Pick one from column A or column B randomly.

This is of course wrong - a query might find two non-relevant results in
corpus A and lots of relevant results in corpus B, leading to poor
precision because the two non-relevant documents are likely to show up on
the first page.   You can weight on the size of the corpus, but weighting
is probably wrong then on any specifc query.

It was an interesting thought experiment though.

Erik,

Since LucidWorks was dinged in the 2013 Magic Quadrant on Enterprise Search
due to a lack of Federated Search, the for-profit Enterprise Search
companies must be doing it some way.Maybe relevance suffers (a lot),
but you can do it if you want to.

I have read very little of the IR literature - enough to sound like I know
a little, but it is a very little.  If there is literature on this, it
would be an interesting read.


On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson erickerick...@gmail.comwrote:

 The lack of global TF/IDF has been answered in the past,
 in the sharded case, by usually you have similar enough
 stats that it doesn't matter. This pre-supposes a fairly
 evenly distributed set of documents.

 But if you're talking about federated search across different
 types of documents, then what would you rescore with?
 How would you even consider scoring docs that are somewhat/
 totally different? Think magazine articles an meta-data associated
 with pictures.

 What I've usually found is that one can use grouping to show
 the top N of a variety of results. Or show tabs with different
 types. Or have the app intelligently combine the different types
 of documents in a way that makes sense. But I don't know
 how you'd just get the right thing to happen with some kind
 of scoring magic.

 Best
 Erick


 On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote:

 I've thought about it, and I have no time to really do a meta-search
 during
 evaluation.  What I need to do is to create a single core that contains
 both of my data sets, and then describe the architecture that would be
 required to do blended results, with liberal estimates.

 From the perspective of evaluation, I need to understand whether any of
 the
 solutions to better ranking in the absence of global IDF have been
 explored?I suspect that one could retrieve a much larger than N set of
 results from a set of shards, re-score in some way that doesn't require
 IDF, e.g. storing both results in the same priority queue and *re-scoring*
 before *re-ranking*.

 The other way to do this would be to have a custom SearchHandler that
 works
 differently - it performs the query, retries all results deemed relevant
 by
 another engine, adds them to the Lucene index, and then performs the query
 again in the standard way.   This would be quite slow, but perhaps useful
 as a way to evaluate my method.

 I still welcome any suggestions on how such a SearchHandler could be
 implemented.





Re: Removing duplicates during a query

2013-08-22 Thread Dan Davis
OK - I see that this can be done with Field Collapsing/Grouping.  I also
see the mentions in the Wiki for avoiding duplicates using a 16-byte hash.

So, question withdrawn...


On Thu, Aug 22, 2013 at 10:21 PM, Dan Davis dansm...@gmail.com wrote:

 Suppose I have two documents with different id, and there is another
 field, for instance content-hash which is something like a 16-byte hash
 of the content.

 Can Solr be configured to return just one copy, and drop the other if both
 are relevant?

 If Solr does drop one result, do you get any indication in the document
 that was kept that there was another copy?




Re: Distance sort on a multi-value field

2013-08-22 Thread Jeff Wartes

This is actually pretty far afield from my original subject, but it turns
out that I also had issues  with NRT and multi-field geospatial
performance in Solr 4, so I'll follow that up.


I've been testing and working with David's SOLR-5170 patch ever since he
posted it, and I pushed it into production with only some cosmetic changes
a few hours ago. 
I have a relatively low update and query rate for this particular query
type, (something like 2 updates/sec, 10 queries/sec) but a short
autosoftcommit time. (5 sec) Based on the data so far this patch looks
like it's brought my average response time down from 4 seconds to about
50ms.

Very nice!



On 8/20/13 7:37 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote:

The distance sorting code in SOLR-2155 is roughly equivalent to the code
that
RPT uses (RPT has its lineage in SOLR-2155 after all).  I just reviewed it
to double-check.  It's possible the behavior is slightly better in
SOLR-2155
because the cache (a Solr cache) contains normal hard-references whereas
RPT
has one based on weak references, which will linger longer.  But I think
the
likelihood of OOM is the same.

Any way, the current best option is
https://issues.apache.org/jira/browse/SOLR-5170  which I posted a few days
ago.

~ David


Billnbell wrote
 We have been using 2155 for over 6 months in production with over 2M
hits
 every 10 minutes. No OOM yet.
 
 2155 seems great, and would this issue be any worse than 2155?
 
 
 
 On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 

 Hm, Give me all the stores that only have branches in this area might
 be
 a plausible use case for farthest distance.
 That's essentially a contains question though, so maybe that's
already
 supported? I guess it depends on how contains/intersects/etc handle
 multi-values. I feel like multi-value interaction really deserves its
own
 section in the documentation.


 I'm aware of the memory issue, but it seems like if you want sort
 multi-valued points, it's either this or try to pull in the 2155 patch.
 In
 general I'd rather go with the thing that's being maintained.


 Thanks for the code pointer. You're right, that doesn't look like
 something I can easily use for more general aggregate scoring control.
Ah
 well.



 On 8/14/13 12:35 PM, Smiley, David W. lt;

 dsmiley@

 gt; wrote:

 
 
 On 8/14/13 2:26 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 
 
 I'm still pondering aggregate-type operations for scoring
multi-valued
 fields (original thread: http://goo.gl/zOX53f ), and it occurred to
me
 that distance-sort with SpatialRecursivePrefixTreeFieldType must be
 doing
 something like that.
 
 It isn't.
 
 
 Somewhat surprisingly I don't see this in the documentation anywhere,
 but
 I presume the example query: (from:
 http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
 q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}
 
 assigns the distance/score based on the *closest* lat/long if the
 sfield
 is a multi-valued field.
 
 Yes it does.
 
 
 That's a reasonable default, but it's a bit arbitrary. Can I sort
based
 on
 the *furthest* lat/long in the document? Or the average distance?
 
 Anyone know more about how this works and could give me some
pointers?
 
 I considered briefly supporting the farthest distance but dismissed it
 as
 I saw no real use-case.  I didn't think of the average distance;
that's
 plausible.  Any way, you're best bet is to dig into the code.  The
 relevant part is ShapeFieldCacheDistanceValueSource.
 
 FYI something to keep in mind:
 https://issues.apache.org/jira/browse/LUCENE-4698
 
 ~ David
 


 
 
 -- 
 Bill Bell

 billnbell@

 cell 720-256-8076





-
 Author: 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp
4084666p4085797.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Distance sort on a multi-value field

2013-08-22 Thread David Smiley (@MITRE.org)
Awesome!

Be sure to watch the JIRA issue as it develops.  The patch will improve
(I've already improved it but not posted it) and one day a solution is bound
to get committed.

~ David


Jeff Wartes wrote
 This is actually pretty far afield from my original subject, but it turns
 out that I also had issues  with NRT and multi-field geospatial
 performance in Solr 4, so I'll follow that up.
 
 
 I've been testing and working with David's SOLR-5170 patch ever since he
 posted it, and I pushed it into production with only some cosmetic changes
 a few hours ago. 
 I have a relatively low update and query rate for this particular query
 type, (something like 2 updates/sec, 10 queries/sec) but a short
 autosoftcommit time. (5 sec) Based on the data so far this patch looks
 like it's brought my average response time down from 4 seconds to about
 50ms.
 
 Very nice!
 
 
 
 On 8/20/13 7:37 PM, David Smiley (@MITRE.org) lt;

 DSMILEY@

 gt; wrote:
 
The distance sorting code in SOLR-2155 is roughly equivalent to the code
that
RPT uses (RPT has its lineage in SOLR-2155 after all).  I just reviewed it
to double-check.  It's possible the behavior is slightly better in
SOLR-2155
because the cache (a Solr cache) contains normal hard-references whereas
RPT
has one based on weak references, which will linger longer.  But I think
the
likelihood of OOM is the same.

Any way, the current best option is
https://issues.apache.org/jira/browse/SOLR-5170  which I posted a few days
ago.

~ David


Billnbell wrote
 We have been using 2155 for over 6 months in production with over 2M
hits
 every 10 minutes. No OOM yet.
 
 2155 seems great, and would this issue be any worse than 2155?
 
 
 
 On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 

 Hm, Give me all the stores that only have branches in this area might
 be
 a plausible use case for farthest distance.
 That's essentially a contains question though, so maybe that's
already
 supported? I guess it depends on how contains/intersects/etc handle
 multi-values. I feel like multi-value interaction really deserves its
own
 section in the documentation.


 I'm aware of the memory issue, but it seems like if you want sort
 multi-valued points, it's either this or try to pull in the 2155 patch.
 In
 general I'd rather go with the thing that's being maintained.


 Thanks for the code pointer. You're right, that doesn't look like
 something I can easily use for more general aggregate scoring control.
Ah
 well.



 On 8/14/13 12:35 PM, Smiley, David W. lt;

 dsmiley@

 gt; wrote:

 
 
 On 8/14/13 2:26 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 
 
 I'm still pondering aggregate-type operations for scoring
multi-valued
 fields (original thread: http://goo.gl/zOX53f ), and it occurred to
me
 that distance-sort with SpatialRecursivePrefixTreeFieldType must be
 doing
 something like that.
 
 It isn't.
 
 
 Somewhat surprisingly I don't see this in the documentation anywhere,
 but
 I presume the example query: (from:
 http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
 q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}
 
 assigns the distance/score based on the *closest* lat/long if the
 sfield
 is a multi-valued field.
 
 Yes it does.
 
 
 That's a reasonable default, but it's a bit arbitrary. Can I sort
based
 on
 the *furthest* lat/long in the document? Or the average distance?
 
 Anyone know more about how this works and could give me some
pointers?
 
 I considered briefly supporting the farthest distance but dismissed it
 as
 I saw no real use-case.  I didn't think of the average distance;
that's
 plausible.  Any way, you're best bet is to dig into the code.  The
 relevant part is ShapeFieldCacheDistanceValueSource.
 
 FYI something to keep in mind:
 https://issues.apache.org/jira/browse/LUCENE-4698
 
 ~ David
 


 
 
 -- 
 Bill Bell

 billnbell@

 cell 720-256-8076





-
 Author: 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp
4084666p4085797.html
Sent from the Solr - User mailing list archive at Nabble.com.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp4084666p4086226.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to access latitude and longitude with only LatLonType?

2013-08-22 Thread David Smiley (@MITRE.org)
Hi Quan

You claim to be using LatLonType, yet the error you posted makes it clear
you are in fact using SpatialRecursivePrefixTreeFieldType (RPT).

Regardless of which spatial field you use, it's not clear to me what sort of
statistics could be useful on a spatial field.  The stats component doesn't
work with any of the spatial fields.  Well... it's possible to use
LatLonType and then do stats on just the latitude or just the longitude (you
should see the auto-generated fields for these in the online schema browser)
but that would unlikely be useful.

~ David


zhangquan913 wrote
 Hello All,
 
 I am currently doing a spatial query in solr. I indexed coordinates
 (type=location class=solr.LatLonType), but the following query failed.
 http://localhost/solr/quan/select?q=*:*stats=truestats.field=coordinatesstats.facet=townshiprows=0
 It showed an error:
 Field type
 location{class=org.apache.solr.schema.SpatialRecursivePrefixTreeFieldType,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={distErrPct=0.025,
 class=solr.SpatialRecursivePrefixTreeFieldType, maxDistErr=0.09,
 units=degrees}} is not currently supported
 
 I don't want to create duplicate indexed field latitude and longitude.
 How can I use only coordinates to do this kind of stats on both latitude
 and longitude?
 
 Thanks,
 Quan





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-access-latitude-and-longitude-with-only-LatLonType-tp4086109p4086229.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Steve Rowe
Dan,

StandardTokenizer implements the word boundary rules from the Unicode Text 
Segmentation standard annex UAX#29:

   http://www.unicode.org/reports/tr29/#Word_Boundaries

Every character sequence within UAX#29 boundaries that contains a numeric or an 
alphabetic character is emitted as a term, and nothing else is emitted.

Punctuation can be included within a term, e.g. 1,248.99 or 192.168.1.1.

To split on underscores, you can convert underscores to e.g. spaces by adding 
PatternReplaeCharFilterFactory to your analyzer:

charFilter class=solr.PatternReplaceCharFilterFactory pattern=_ 
replacement= /

This replacement will be performed prior to StandardTokenizer, which will then 
see token-splitting spaces instead of underscores.

Steve

On Aug 22, 2013, at 10:23 PM, Dan Davis dansm...@gmail.com wrote:

 Ah, but what is the definition of punctuation in Solr?
 
 
 On Wed, Aug 21, 2013 at 11:15 PM, Jack Krupansky 
 j...@basetechnology.comwrote:
 
 I thought that the StandardTokenizer always split on punctuation, 
 
 Proving that you haven't read my book! The section on the standard
 tokenizer details the rules that the tokenizer uses (in addition to
 extensive examples.) That's what I mean by deep dive.
 
 -- Jack Krupansky
 
 -Original Message- From: Shawn Heisey
 Sent: Wednesday, August 21, 2013 10:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to avoid underscore sign indexing problem?
 
 
 On 8/21/2013 7:54 PM, Floyd Wu wrote:
 
 When using StandardAnalyzer to tokenize string Pacific_Rim will get
 
 ST
 textraw_**bytesstartendtypeposition
 pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1
 
 How to make this string to be tokenized to these two tokens Pacific,
 Rim?
 Set _ as stopword?
 Please kindly help on this.
 Many thanks.
 
 
 Interesting.  I thought that the StandardTokenizer always split on
 punctuation, but apparently that's not the case for the underscore
 character.
 
 You can always use the WordDelimeterFilter after the StandardTokenizer.
 
 http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
 WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
 
 Thanks,
 Shawn