Sorting results by last update date

2013-05-29 Thread Kamal Palei
Hi All
I am trying to sort the results as per last updated date. My url looks as
below.

*fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
588]fq=salary:[0 TO 500] OR
salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java
+sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java
+sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
json.nl=mapsort=last_updated_date asc
*
With this I get the data in ascending order of last updated date.

If I am trying to sort data in descending order, I use below url

*fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
588]fq=salary:[0 TO 500] OR
salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java
+sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java
+sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
json.nl=mapsort=last_updated_date desc*

Here the data set is not ordered properly, mostly it looks to me data is
ordered on basis of score, not last updated date.

Can somebody tell me what I am missing here, why *desc* is not working
properly for me.

Thanks
kamal


Re: What exactly happens to extant documents when the schema changes?

2013-05-29 Thread Dotan Cohen
On Tue, May 28, 2013 at 2:20 PM, Upayavira u...@odoko.co.uk wrote:
 The schema provides Solr with a description of what it will find in the
 Lucene indexes. If you, for example, changed a string field to an
 integer in your schema, that'd mess things up bigtime. I recently had to
 upgrade a date field from the 1.4.1 date field format to the newer
 TrieDateField. Given I had to do it on a live index, I had to add a new
 field (just using copyfield) and re-index over the top, as the old field
 was still in use. I guess, given my app now uses the new date field
 only, I could presumably reindex the old date field with the new
 TrieDateField format, but I'd want to try that before I do it for real.


Thank you for the insight. Unfortunately, with 20 million records and
growing by hundreds each minute (social media posts) I don't see that
I could ever reindex the data in a timely way.


 However, if you changed a single valued field to a multi-valued one,
 that's not an issue, as a field with a single value is still valid for a
 multi-valued field.

 Also, if you add a new field, existing documents will be considered to
 have no value in that field. If that is acceptable, then you're fine.

 I guess if you remove a field, then those fields will be ignored by
 Solr, and thus not impact anything. But I have to say, I've never tried
 that.

 Thus - changing the schema will only impact on future indexing. Whether
 your existing index will still be valid depends upon the changes you are
 making.

 Upayavira

Thanks.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: What exactly happens to extant documents when the schema changes?

2013-05-29 Thread Dotan Cohen
On Tue, May 28, 2013 at 3:58 PM, Jack Krupansky j...@basetechnology.com wrote:
 The technical answer: Undefined and not guaranteed.


I was afraid of that!

 Sure, you can experiment and see what the effects happen to be in any
 given release, and maybe they don't tend to change (too much) between most
 releases, but there is no guarantee that any given change schema but keep
 existing data without a delete of directory contents and full reindex will
 actually be benign or what you expect.

 As a general proposition, when it comes to changing the schema and not
 deleting the directory and doing a full reindex, don't do it! Of course, we
 all know not to try to walk on thin ice, but a lot of people will try to do
 it anyway - and maybe it happens that most of the time the results are
 benign.


In the case of this particular application, reindexing really is
overly burdensome as the application is performing hundreds of writes
to the index per minute. How might I gauge how much spare I/O Solr
could commit to a reindex? All the data that I need is in fact in
stored fields.

Note that because the social media application that feeds our Solr
index is global, there are no 'off hours'.


 OTOH, you could file a Jira to propose that the effects of changing the
 schema but keeping the existing data should be precisely defined and
 documented, but, that could still change from release to release.


Seems like a lot of effort to document, for little benefit. I'm not
going to file it. I would like to know, though, is the schema
consulted at index time, query time, or both?


 From a practical perspective for your original question: If you suddenly add
 a field, there is no guarantee what will happen when you try to access that
 field for existing documents, or what will happen if you update existing
 documents. Sure, people can talk about what happens to be true today, but
 there is no guarantee for the future. Similarly for deleting a field from
 the schema, there is no guarantee about the status of existing data, even
 though people can chatter about what it seems to do today.

 Generally, you should design your application around contracts and what is
 guaranteed to be true, not what happens to be true from experiments or even
 experience. Granted, that is the theory and sometimes you do need to rely on
 experimentation and folklore and spotty or ambiguous documentation, but to
 the extent possible, it is best to avoid explicitly trying to rely on
 undocumented, uncontracted behavior.


Thanks. The application does change (added features) and we do not
want to loose old data.


 One question I asked long ago and never received an answer: what is the best
 practice for doing a full reindex - is it sufficient to first do a delete of
 *:*, or does the Solr index directory contents or even the directory
 itself need to be explicitly deleted first? I believe it is the latter, but
 the former seems to work, most of the time. Deleting the directory itself
 seems to be the best answer, to date - but no guarantees!


I don't have an answer for that, sorry!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Choosing specific fields for suggestions in SpellCheckerComponent

2013-05-29 Thread Shalin Shekhar Mangar
Hi Wilson,

I don't think SpellCheckComponent supports multiple fields in the same
dictionary. Am I missing something?


On Wed, May 29, 2013 at 10:24 AM, Wilson Passos wrpas...@gmail.com wrote:

 Hi everyone,


 I've been searching about how to configure the SpellCheckerComponent in
 Solr 4.0 to support suggestion queries based on s subset of the configured
 fields in schema.xml. Let's say the spell checking is configured to use
 these 4 fields:

 field name=field1 type=text_general/
 field name=field2 type=text_general/
 field name=field3 type=text_general/
 field name=field4 type=text_general/

 I'd like to know if there's any possibility to dynamically set the
 SpellCheckerComponent to suggest terms using just fields field2 and
 field3 instead of the default behavior, which always includes suggestions
 across the 4 defined fields.

 Thanks in advance for any help!




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-29 Thread Shalin Shekhar Mangar
I have opened https://issues.apache.org/jira/browse/SOLR-4870


On Tue, May 28, 2013 at 5:53 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 This sounds like a bug. I'll open an issue. Thanks!


 On Tue, May 28, 2013 at 2:29 PM, AlexeyK lex.kudi...@gmail.com wrote:

 The cluster state problem reported above is not an issue - it was caused
 by
 our own code.
 Speaking about the update log - i have noticed a strange behavior
 concerning
 the replay. The replay is *supposed* to be done for a predefined number of
 log entries, but actually it is always done for the whole last 2 tlogs.
 RecentUpdates.update() reads log within  while (numUpdates 
 numRecordsToKeep), while numUpdates is never incremented, so it exits when
 the reader reaches EOF.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4066452.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-29 Thread Shalin Shekhar Mangar
On Thu, May 23, 2013 at 7:00 PM, AlexeyK lex.kudi...@gmail.com wrote:

snip /


 from what I understood from the code, for each 'add' command there is a
 test
 for a 'delete by query'. if there is an older dbq, it's run after the 'add'
 operation if its version  'add' version.
 in my case, there are a lot of documents to be inserted, and a single large
 DBQ. My question is: shouldn't this be done in bulks? Why is it necessary
 to
 run the DBQ after each insertion? Supposedly there are 1000 insertions it's
 run 1000 times.



As I understand it, this is done to handle out-of-order updates. Suppose a
client makes a few add requests and then invokes a DBQ but the DBQ reaches
the replicas before the last add request. In such a case, the DBQ is
executed after the add request to preserve consistency. We don't do that in
bulk because we don't know how long to wait for all add requests to arrive.
Also, the individual add requests may arrive via different threads (think
connection reset from leader to replica).

That being said, the scenario you describe of a 1000 insertions causing
DBQs to be run a large number of times (on recovery after restarting) could
be optimized. Note that the bug you discovered (SOLR-4870) does not affect
log replay because log replay on startup will replay all of the last two
transaction logs (unless they end with a commit). Only PeerSync is affected
by SOLR-4870.

You say that both nodes are leaders but the comment inside
DirectUpdateHandler2.addDoc() says that deletesAfter (i.e. reordered DBQs)
should always be null on leaders. So there's definitely something fishy
here. A quick review of the code leads me to believe that reordered DBQs
can happen on a leader as well. I'll investigate further.


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4065628.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Sorting results by last update date

2013-05-29 Thread Shalin Shekhar Mangar
On Wed, May 29, 2013 at 12:10 PM, Kamal Palei palei.ka...@gmail.com wrote:

 Hi All
 I am trying to sort the results as per last updated date. My url looks as
 below.

 *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
 588]fq=salary:[0 TO 500] OR

 salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java

 +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java

 +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
 json.nl=mapsort=last_updated_date asc
 *
 With this I get the data in ascending order of last updated date.

 If I am trying to sort data in descending order, I use below url

 *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
 588]fq=salary:[0 TO 500] OR

 salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java

 +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java

 +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
 json.nl=mapsort=last_updated_date desc*

 Here the data set is not ordered properly, mostly it looks to me data is
 ordered on basis of score, not last updated date.

 Can somebody tell me what I am missing here, why *desc* is not working
 properly for me.


What is the field type of last_update_date? Which version of Solr?

A side note: Using NOW in a filter query is ineffecient because it doesn't
use your filter cache effectively. Round it to nearest time interval
instead. See http://java.dzone.com/articles/solr-date-math-now-and-filter

-- 
Regards,
Shalin Shekhar Mangar.


Re: delta-import tweaking?

2013-05-29 Thread Kristian Rink
Hi Shawn;

and first off, thanks bunches for your pointers.

Am Tue, 28 May 2013 09:31:54 -0600
schrieb Shawn Heisey s...@elyograg.org:
 My workaround was to store the highest indexed autoincrement value in
 a location outside Solr.  In my original Perl code, I dropped it into
 a file on NFS.  The latest iteration of my indexing code (Java, using 
 SolrJ) no longer uses DIH for regular indexing, but it still uses
 that stored autoincrement value, this time in another database
 table.  I do still use full-import for complete index rebuilds.

Well, overally after playing with it a bit last nite, I decided to also
go down the SolrJ way; we'll be likely to use this in the future anyway
as the rest of our environment's Java too, so going for it right now
seems just the logical thing to do.

Thanks and all the best! 
Kristian 


Reindexing strategy

2013-05-29 Thread Dotan Cohen
I see that I do need to reindex my Solr index. The index consists of
20 million documents with a few hundred new documents added per minute
(social media data). The documents are mostly smaller than 1KiB of
data, but some may go as large as 10 KiB. All the data is text, and
all indexed fields are stored.

To reindex, I am considering adding a 'last_indexed' field, and having
a Python or Java application pull out N results every T seconds when
sorting on last_indexed asc. How might I determine a good values for
N and T? I would like to know when the Solr index is 'overloaded', or
whatever happens to Solr when it is being pushed beyond the limits of
its hardware. What should I be looking at to know if Solr is over
stressed? Is looking at CPU and memory good enough? Is there a way to
measure I/O to the disk on which the Solr index is stored? Bear in
mind that while the reindex is happening, clients will be performing
searches and a few hundred documents will be written per minute. Note
that the machine running Solr is an EC2 instance running on Amazon Web
Services, and that the 'disk' on which the Solr index is stored in an
EBS volume.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Strange behavior on text field with number-text content

2013-05-29 Thread Erick Erickson
Hmmm, there are two things you _must_ get familiar with when diagnosing
these G..

1 admin/analysis. That'll show you exactly what the analysis chain does,
and it's
 not always obvious.
2 add debug=query to your input and look at the parsed query results. For
instance,
 this name:4nSolution Inc. parses as name:4nSolution defaultfield:inc.

That doesn't explain why name=4nSolutions, except..

your index chain has splitOnCaseChange=1 and your query bit has
splitOnCaseChange=0
which doesn't seem right

Best
Erick


On Tue, May 28, 2013 at 10:31 AM, Алексей Цой alexey...@gmail.com wrote:

 solr-user-unsubscribe solr-user-unsubscr...@lucene.apache.org


 2013/5/28 Michał Matulka michal.matu...@gowork.pl

  Thanks for your responses, I must admit that after hours of trying I
 made some mistakes.
 So the most problematic phrase will now be:
 4nSolution Inc. which cannot be found using query:

 name:4nSolution

 or even

 name:4nSolution Inc.

 but can be using following queries:

 name:nSolution
 name:4
 name:inc

 Sorry for the mess, it turned out I didn't reindex fields after modyfying
 schema so I thought that the problem also applies to 300letters .

 The cause of all of this is the WordDelimiter filter defined as following:

 fieldType name=text class=solr.TextField
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 !-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0
 preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
   /analyzer
 /fieldType

 and I still don't know why it behaves like that - after all there is
 preserveOriginal attribute set to 1...

 W dniu 28.05.2013 14:21, Erick Erickson pisze:

 Hmmm, with 4.x I get much different behavior than you're
 describing, what version of Solr are you using?

 Besides Alex's comments, try adding debug=query to the url and see what 
 comes
 out from the query parser.

 A quick glance at the code shows that DefaultAnalyzer is used, which doesn't 
 do
 any analysis, here's the javadoc...
  /**
* Default analyzer for types that only produces 1 verbatim token...
* A maximum size of chars to be read must be specified
*/

 so it's much like the string type. Which means I'm totally perplexed by 
 your
 statement that 300 and letters return a hit. Have you perhaps changed the
 field definition and not re-indexed?

 The behavior you're seeing really looks like somehow 
 WordDelimiterFilterFactory
 is getting into your analysis chain with settings that don't mash the parts 
 back
 together, i.e. you can set up WDDF to split on letter/number transitions, 
 index
 each and NOT index the original, but I have no explanation for how that
 could happen with the field definition you indicated

 FWIW,
 Erick

 On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitcharafa...@gmail.com 
 arafa...@gmail.com wrote:

   What does analyzer screen say in the Web AdminUI when you try to do that?
 Also, what are the tokens stored in the field (also in Web AdminUI).

 I think it is very strange to have TextField without a tokenizer chain.
 Maybe you get a standard one assigned by default, but I don't know what the
 standard chain would be.

 Regards,

   Alex.
 On 28 May 2013 04:44, Michał Matulka michal.matu...@gowork.pl 
 michal.matu...@gowork.pl wrote:


  Hello,

 I've got following problem. I have a text type in my schema and a field
 name of that type.
 That field contains a data, there is, for example, record 

Re: Keeping a rolling window of indexes around solr

2013-05-29 Thread Erick Erickson
I suspect you're worrying about something you don't need to. At 1 insert every
30 seconds, and assuming 30,000,000 records will fit on a machine (I've seen
this), you're talking 1,000,000 seconds worth of data on a single box!
Or roughly
10,000 day's worth of data. Test, of course, YMMV.

Or I'm mis-understanding what 1 log insert means, I guess it could be a full
log file

But do the simple thing first, just let Solr do what it does by
default and periodically
do a delete by query on documents you want to roll off the end. Especially since
you say that queries happen every few days. The tricks for utilizing
hot shards are
probably not very useful for you with that low a query rate.

Test, of course
Best
Erick

On Tue, May 28, 2013 at 8:42 PM, Saikat Kanjilal sxk1...@hotmail.com wrote:
 Volume of data:
 1 log insert every 30 seconds, queries done sporadically asynchronously every 
 so often at a much lower frequency every few days

 Also the majority of the requests are indeed going to be within a splice of 
 time (typically hours or at most a few days)

 Type of queries:
 Keyword or termsearch
 Search by guid (or id as known in the solr world)
 Reserved or percolation queries to be executed when new data becomes available
 Search by dates as mentioned above

 Regards


 Sent from my iPhone

 On May 28, 2013, at 4:25 PM, Chris Hostetter hossman_luc...@fucit.org wrote:


 : This is kind of the approach used by elastic search , if I'm not using
 : solrcloud will I be able to use shard aliasing, also with this approach
 : how would replication work, is it even needed?

 you haven't said much about hte volume of data you expect to deal with,
 nor have you really explained what types of queries you intend to do --
 ie: you said you were intersted in a rolling window of indexes
 around n days of data but you never clarified why you think a
 rolling window of indexes would be useful to you or how exactly you would
 use it.

 The primary advantage of sharding by date is if you know that a large
 percentage of your queries are only going to be within a small range of
 time, and therefore you can optimize those requests to only hit the shards
 neccessary to satisfy that small windo of time.

 if the majority of requests are going to be across your entire n days of
 data, then date based sharding doesn't really help you -- you can just use
 arbitrary (randomized) sharding using periodic deleteByQuery commands to
 purge anything older then N days.  Query the whole collection by default,
 and add a filter query if/when you want to restrict your search to only a
 narrow date range of documents.

 this is the same general approach you would use on a non-distributed /
 non-SolrCloud setup if you just had a single collection on a single master
 replicated to some number of slaves for horizontal scaling.


 -Hoss



Re: split document or not

2013-05-29 Thread Hard_Club
But in this case phrase frequence per whole document will be not taken into
accout because document is splitted by subdocuments. Or it is not true?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/split-document-or-not-tp4066170p4066734.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Note on The Book

2013-05-29 Thread Yago Riveiro
IMHO I prefer narrative, as Erick says, explain all use-cases it's impossible, 
cover the base cases is a good start.  Either way I miss a book about solr 
different to a cookbook or a guide.  

Regards.

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, May 29, 2013 at 12:19 PM, Erick Erickson wrote:

 FWIW, picking up on Alexandre's point. One of my continual
 frustrations with virtually _all_
 technical books is they become endless pages of details without ever
 mentioning why
 the hell I should care. Unfortunately, explaining use-cases for
 everything would only make
 the book about 10,000 pages long. Siiigh.
  
 I guess you can take this as a vote for narrative
  
 Erick
  
 On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky j...@basetechnology.com 
 (mailto:j...@basetechnology.com) wrote:
  We'll have a blog for the book. We hope to have a first
  raw/rough/partial/draft published as an e-book in maybe 10 days to 2 weeks.
  As soon as we get that process under control, we'll start the blog. I'll
  keep your email on file and keep you posted.
   
  -- Jack Krupansky
   
  -Original Message- From: Swati Swoboda
  Sent: Tuesday, May 28, 2013 1:36 PM
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Subject: RE: Note on The Book
   
   
  I'd definitely prefer the spiral bound as well. E-books are great and your
  draft version seems very reasonably priced (aka I would definitely get it).
   
  Really looking forward to this. Is there a separate mailing list / etc. for
  the book for those who would like to receive updates on the status of the
  book?
   
  Thanks
   
  Swati Swoboda
  Software Developer - Igloo Software
  +1.519.489.4120 sswob...@igloosoftware.com 
  (mailto:sswob...@igloosoftware.com)
   
  Bring back Cake Fridays – watch a video you’ll actually like
  http://vimeo.com/64886237
   
   
  -Original Message-
  From: Jack Krupansky [mailto:j...@basetechnology.com]
  Sent: Thursday, May 23, 2013 7:15 PM
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Subject: Note on The Book
   
  To those of you who may have heard about the Lucene/Solr book that I and two
  others are writing on Lucene and Solr, some bad and good news. The bad news:
  The book contract with O’Reilly has been canceled. The good news: I’m going
  to proceed with self-publishing (possibly on Lulu or even Amazon) a somewhat
  reduced scope Solr-only Reference Guide (with hints of Lucene). The scope of
  the previous effort was too great, even for O’Reilly – a book larger than
  800 pages (or even 600) that was heavy on reference and lighter on “guide”
  just wasn’t fitting in with their traditional “guide” model. In truth, Solr
  is just too complex for a simple guide that covers it all, let alone Lucene
  as well.
   
  I’ll announce more details in the coming weeks, but I expect to publish an
  e-book-only version of the book, focused on Solr reference (and plenty of
  guide as well), possibly on Lulu, plus eventually publish 4-8 individual
  print volumes for people who really want the paper. One model I may pursue
  is to offer the current, incomplete, raw, rough, draft as a $7.99 e-book,
  with the promise of updates every two weeks or a month as new and revised
  content and new releases of Solr become available. Maybe the individual
  e-book volumes would be $2 or $3. These are just preliminary ideas. Feel
  free to let me know what seems reasonable or excessive.
   
  For paper: Do people really want perfect bound, or would you prefer spiral
  bound that lies flat and folds back easily? I suppose we could offer both –
  which should be considered “premium”?
   
  I’ll announce more details next week. The immediate goal will be to get the
  “raw rough draft” available to everyone ASAP.
   
  For those of you who have been early reviewers – your effort will not have
  been in vain. I have all your comments and will address them over the next
  month or two or three.
   
  Just for some clarity, the existing Solr Wiki and even the recent
  contribution of the LucidWorks Solr Reference to Apache really are still
  great contributions to general knowledge about Solr, but the book is
  intended to go much deeper into detail, especially with loads of examples
  and a lot more narrative guide. For example, the book has a complete list of
  the analyzer filters, each with a clean one-liner description. Ditto for
  every parameter (although I would note that the LucidWorks Solr Reference
  does a decent job of that as well.) Maybe, eventually, everything in the
  book COULD (and will) be integrated into the standard Solr doc, but until
  then, a single, integrated reference really is sorely needed. And, the book
  has a lot of narrative guide and walking through examples as well. Over
  time, I’m sure both will evolve. And just to be clear, the book is not a
  simple repurposing of the Solr wiki content – EVERY 

How can a Tokenizer be CoreAware?

2013-05-29 Thread Benson Margulies
I am currently testing some things with Solr 4.0.0. I tried to make a
tokenizer CoreAware, and was rewarded with:

Caused by: org.apache.solr.common.SolrException: Invalid 'Aware'
object: com.basistech.rlp.solr.RLPTokenizerFactory@19336006 --
org.apache.solr.util.plugin.SolrCoreAware must be an instance of:
[org.apache.solr.request.SolrRequestHandler]
[org.apache.solr.response.QueryResponseWriter]
[org.apache.solr.handler.component.SearchComponent]
[org.apache.solr.update.processor.UpdateRequestProcessorFactory]
[org.apache.solr.handler.component.ShardHandlerFactory]

I need this to allow cleanup of some cached items in the tokenizer.

Questions:

1: will a newer version allow me to do this directly?
2: is there some other approach that anyone would recommend? I could,
for example, make a fake object in the list above to act as a
singleton with a static accessor, but that seems pretty ugly.


Re: Reindexing strategy

2013-05-29 Thread Upayavira
I presume you are running Solr on a multi-core/CPU server. If you kept a
single process hitting Solr to re-index, you'd be using just one of
those cores. It would take as long as it takes, I can't see how you
would 'overload' it that way. 

I guess you could have a strategy that pulls 100 documents with an old
last_indexed, and push them for re-indexing. If you get the full 100
docs, you make a subsequent request immediately. If you get less than
100 back, you know you're up-to-date and can wait, say, 30s before
making another request.

Upayavira

On Wed, May 29, 2013, at 12:00 PM, Dotan Cohen wrote:
 I see that I do need to reindex my Solr index. The index consists of
 20 million documents with a few hundred new documents added per minute
 (social media data). The documents are mostly smaller than 1KiB of
 data, but some may go as large as 10 KiB. All the data is text, and
 all indexed fields are stored.
 
 To reindex, I am considering adding a 'last_indexed' field, and having
 a Python or Java application pull out N results every T seconds when
 sorting on last_indexed asc. How might I determine a good values for
 N and T? I would like to know when the Solr index is 'overloaded', or
 whatever happens to Solr when it is being pushed beyond the limits of
 its hardware. What should I be looking at to know if Solr is over
 stressed? Is looking at CPU and memory good enough? Is there a way to
 measure I/O to the disk on which the Solr index is stored? Bear in
 mind that while the reindex is happening, clients will be performing
 searches and a few hundred documents will be written per minute. Note
 that the machine running Solr is an EC2 instance running on Amazon Web
 Services, and that the 'disk' on which the Solr index is stored in an
 EBS volume.
 
 Thank you.
 
 --
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com


Re: Reindexing strategy

2013-05-29 Thread Dotan Cohen
On Wed, May 29, 2013 at 2:41 PM, Upayavira u...@odoko.co.uk wrote:
 I presume you are running Solr on a multi-core/CPU server. If you kept a
 single process hitting Solr to re-index, you'd be using just one of
 those cores. It would take as long as it takes, I can't see how you
 would 'overload' it that way.


I mean 'overload' Solr in the sense that it cannot read, process, and
write data fast enough because too much data is being handled. I
remind you that this system is writing hundreds of documents per
minute. Certainly there is a limit to what Solr can handle. I ask how
to know how close I am to this limit.


 I guess you could have a strategy that pulls 100 documents with an old
 last_indexed, and push them for re-indexing. If you get the full 100
 docs, you make a subsequent request immediately. If you get less than
 100 back, you know you're up-to-date and can wait, say, 30s before
 making another request.


Actually, I would add a filter query for documents whose last_index
value is before the last schema change, and stop when less documents
were returned than were requested.

Thanks.


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Note on The Book

2013-05-29 Thread Alexandre Rafalovitch
Perhaps, you will enjoy mine then:
http://www.packtpub.com/apache-solr-for-indexing-data/book .

I will send a formal announcement to the list a little later, but
basically this is a book for advanced beginners and early
intermediates and takes them from a basic index to multilingual
indexing with bells and whistles. Covers a small part of Solr (Solr is
big!), but shows how different parts work together. It's structured as
a cookbook but the narrative is a journey.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, May 29, 2013 at 7:33 AM, Yago Riveiro yago.rive...@gmail.com wrote:
 IMHO I prefer narrative, as Erick says, explain all use-cases it's 
 impossible, cover the base cases is a good start.  Either way I miss a book 
 about solr different to a cookbook or a guide.

 Regards.

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Wednesday, May 29, 2013 at 12:19 PM, Erick Erickson wrote:

 FWIW, picking up on Alexandre's point. One of my continual
 frustrations with virtually _all_
 technical books is they become endless pages of details without ever
 mentioning why
 the hell I should care. Unfortunately, explaining use-cases for
 everything would only make
 the book about 10,000 pages long. Siiigh.

 I guess you can take this as a vote for narrative

 Erick

 On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky j...@basetechnology.com 
 (mailto:j...@basetechnology.com) wrote:
  We'll have a blog for the book. We hope to have a first
  raw/rough/partial/draft published as an e-book in maybe 10 days to 2 weeks.
  As soon as we get that process under control, we'll start the blog. I'll
  keep your email on file and keep you posted.
 
  -- Jack Krupansky
 
  -Original Message- From: Swati Swoboda
  Sent: Tuesday, May 28, 2013 1:36 PM
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Subject: RE: Note on The Book
 
 
  I'd definitely prefer the spiral bound as well. E-books are great and your
  draft version seems very reasonably priced (aka I would definitely get it).
 
  Really looking forward to this. Is there a separate mailing list / etc. for
  the book for those who would like to receive updates on the status of the
  book?
 
  Thanks
 
  Swati Swoboda
  Software Developer - Igloo Software
  +1.519.489.4120 sswob...@igloosoftware.com 
  (mailto:sswob...@igloosoftware.com)
 
  Bring back Cake Fridays – watch a video you’ll actually like
  http://vimeo.com/64886237
 
 
  -Original Message-
  From: Jack Krupansky [mailto:j...@basetechnology.com]
  Sent: Thursday, May 23, 2013 7:15 PM
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Subject: Note on The Book
 
  To those of you who may have heard about the Lucene/Solr book that I and 
  two
  others are writing on Lucene and Solr, some bad and good news. The bad 
  news:
  The book contract with O’Reilly has been canceled. The good news: I’m going
  to proceed with self-publishing (possibly on Lulu or even Amazon) a 
  somewhat
  reduced scope Solr-only Reference Guide (with hints of Lucene). The scope 
  of
  the previous effort was too great, even for O’Reilly – a book larger than
  800 pages (or even 600) that was heavy on reference and lighter on “guide”
  just wasn’t fitting in with their traditional “guide” model. In truth, Solr
  is just too complex for a simple guide that covers it all, let alone Lucene
  as well.
 
  I’ll announce more details in the coming weeks, but I expect to publish an
  e-book-only version of the book, focused on Solr reference (and plenty of
  guide as well), possibly on Lulu, plus eventually publish 4-8 individual
  print volumes for people who really want the paper. One model I may pursue
  is to offer the current, incomplete, raw, rough, draft as a $7.99 e-book,
  with the promise of updates every two weeks or a month as new and revised
  content and new releases of Solr become available. Maybe the individual
  e-book volumes would be $2 or $3. These are just preliminary ideas. Feel
  free to let me know what seems reasonable or excessive.
 
  For paper: Do people really want perfect bound, or would you prefer spiral
  bound that lies flat and folds back easily? I suppose we could offer both –
  which should be considered “premium”?
 
  I’ll announce more details next week. The immediate goal will be to get the
  “raw rough draft” available to everyone ASAP.
 
  For those of you who have been early reviewers – your effort will not have
  been in vain. I have all your comments and will address them over the next
  month or two or three.
 
  Just for some clarity, the existing Solr Wiki and even the recent
  contribution of the LucidWorks Solr Reference to Apache really are still
  great contributions 

Problem with xpath expression in data-config.xml

2013-05-29 Thread Hans-Peter Stricker
Replacing the contents of 
solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml


by

dataConfig
   dataSource type=URLDataSource /
   document
   entity name=beautybooks88  pk=link 
url=http://beautybooks88.blogspot.com/feeds/posts/default; 
processor=XPathEntityProcessor forEach=/feed/entry 
transformer=DateFormatTransformer

field column=source xpath=/feed/title 
commonField=true /
			field column=source-link xpath=/feed/link[@rel='self']/@href 
commonField=true /


field column=title xpath=/feed/entry/title /
field column=link 
xpath=/feed/entry/link[@rel='self']/@href /
			field column=description xpath=/feed/entry/content 
stripHTML=true/

field column=creator xpath=/feed/entry/author /
field column=item-subject 
xpath=/feed/entry/category/@term/
			field column=date xpath=/feed/entry/updated 
dateTimeFormat=-MM-dd'T'HH:mm:ss /

/entity
   /document
/dataConfig

and running the full dataimport from 
http://localhost:8983/solr/#/rss/dataimport//dataimport results in an error.


1) How could I have found the reason faster than I did - by looking into 
which log files,?


2) If you remove the first occurrence of /@href above, the import succeeds. 
(Note that the same pattern works for column link.) What's the reason 
why?!!


Best regards and thanks in advance

Hans-Peter 





Advice : High-traffic web site

2013-05-29 Thread Ramzi Alqrainy
Hi Team,

Please I need your advice, I have high-traffic web site (100 million page
views/month) to 22 country and I want to build fast and powerfull search
engine. So, I use solr 4.3 and sperate every country to collection , but I
want to build right structure to accommodates high traffic .So, What advise
me to use? Solr cloud or Master-Slave or multi-cores .


Thanks in advance. 
Ramzi,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Advice-High-traffic-web-site-tp4066745.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: split document or not

2013-05-29 Thread Hard_Club
Do I need first search whole document Id and next between its paragraphs
stored in separated docs?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/split-document-or-not-tp4066170p4066751.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Note on The Book

2013-05-29 Thread Jack Krupansky
Erick, your point is well taken. Although my primary interest/skill is to 
produce a solid foundation reference (including tons of examples), the real 
goal is to then build on top of that foundation.


While I focus on the hard-core material - which really does include some 
narrative and lots of examples in addition to tons of mere reference, my 
co-author, Ryan Tabora, will focus almost exclusively on... narrative and 
diagrams.


And when I say reference, I also mean lots of examples. Even as the 
hard-core reference stabilizes, the examples will continue to grow (like 
weeds!).


Once we get the current, existing, under-review, chapters packaged into the 
new book and available for purchase and download (maybe Lulu, not decided) - 
available, in a couple of weeks, it will be updated approximately every 
other week, both with additional reference material, and additional 
narrative and diagrams.


One of our priorities (after we get through Stage 0 of the next few weeks) 
is to in fact start giving each of the long Deep Dive Chapters enough 
narrative lead to basically say exactly that - why you should care.


A longer-term priority is to improve the balance of narrative and hard-core 
reference. Yeah, that will be a lot of pages. It already is. We were at 907 
pages and I was about to drop in another 166 pages on update handlers when 
O'Reilly threw up their hands and pulled the plug. I was estimating 1200 
pages at that stage. And I'll probably have another 60-80 pages on update 
request processors within a week or so. With more to come. That did include 
a lot of hard-core material and example code for Lucene, which won't be in 
the new Solr-only book. By focusing on an e-book the raw page count alone 
becomes moot. We haven't given up on print - the intent is eventually to 
have multiple volumes (4-8 or so, maybe more), both as cheaper e-books ($3 
to $5 each) and slimmer print volumes for people who don't need everything 
in print.


In fact, we will likely offer the revamped initial chapters of the book as a 
standalone introduction to Solr - narrative introduction (why should you 
care about Solr), basic concepts of Lucene and Solr (and why you should 
care!), brief tutorial walkthough of the major feature areas of Solr, and a 
case study. The intent would be both e-book and a slim print volume (75 
pages?).


Another priority (beyond Stage 0) is to develop a detailed roadmap diagram 
of Solr and how applications can use Solr, and then use that to show how 
each of the Deep Dive sections (heavy reference, but gradually adding more 
narrative over time.)


We will probably be very open to requests - what people really wish a book 
would actually do for them. The only request we won't be open to is to do it 
all in only 300 pages.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, May 29, 2013 7:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

FWIW, picking up on Alexandre's point. One of my continual
frustrations with virtually _all_
technical books is they become endless pages of details without ever
mentioning why
the hell I should care. Unfortunately, explaining use-cases for
everything would only make
the book about 10,000 pages long. Siiigh.

I guess you can take this as a vote for narrative

Erick

On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky j...@basetechnology.com 
wrote:

We'll have a blog for the book. We hope to have a first
raw/rough/partial/draft published as an e-book in maybe 10 days to 2 
weeks.

As soon as we get that process under control, we'll start the blog. I'll
keep your email on file and keep you posted.

-- Jack Krupansky

-Original Message- From: Swati Swoboda
Sent: Tuesday, May 28, 2013 1:36 PM
To: solr-user@lucene.apache.org
Subject: RE: Note on The Book


I'd definitely prefer the spiral bound as well. E-books are great and your
draft version seems very reasonably priced (aka I would definitely get 
it).


Really looking forward to this. Is there a separate mailing list / etc. 
for

the book for those who would like to receive updates on the status of the
book?

Thanks

Swati Swoboda
Software Developer - Igloo Software
+1.519.489.4120  sswob...@igloosoftware.com

Bring back Cake Fridays – watch a video you’ll actually like
http://vimeo.com/64886237


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, May 23, 2013 7:15 PM
To: solr-user@lucene.apache.org
Subject: Note on The Book

To those of you who may have heard about the Lucene/Solr book that I and 
two
others are writing on Lucene and Solr, some bad and good news. The bad 
news:
The book contract with O’Reilly has been canceled. The good news: I’m 
going
to proceed with self-publishing (possibly on Lulu or even Amazon) a 
somewhat
reduced scope Solr-only Reference Guide (with hints of Lucene). The scope 
of

the previous effort was too great, even for O’Reilly – a book larger than
800 pages (or 

Re: What exactly happens to extant documents when the schema changes?

2013-05-29 Thread Shawn Heisey
On 5/29/2013 1:07 AM, Dotan Cohen wrote:
 In the case of this particular application, reindexing really is
 overly burdensome as the application is performing hundreds of writes
 to the index per minute. How might I gauge how much spare I/O Solr
 could commit to a reindex? All the data that I need is in fact in
 stored fields.
 
 Note that because the social media application that feeds our Solr
 index is global, there are no 'off hours'.

I handle this in a very specific way with my sharded index.  This won't
work for all designs, and the precise procedure won't work for SolrCloud.

There is a 'live' and a 'build' core for each of my shards.  When I want
to reindex, the program makes a note of my current position for deletes,
reinserts, and new documents.  Then I use a DIH full-import from mysql
into the build cores.  Once the import is done, I run the update cycle
of deletes, reinserts, and new documents on those build cores, using the
position information noted earlier.  Then I swap the cores so the new
index is online.

To adapt this for SolrCloud, I would need to use two collections, and
update a collection alias for what is considered live.

To control the I/O and CPU usage, you might need some kind of throttling
in your update/rebuild application.

I don't need any throttling in my design.  Because I'm using DIH, the
import only uses a single thread for each shard on the server.  I've got
RAID10 for storage and half of the CPU cores are still available for
queries, so it doesn't overwhelm the server.

The rebuild does lower performance, so I have the other copy of the
index handle queries while the rebuild is underway.  When the rebuild is
done on one copy, I run it again on the other copy.  Right now I'm
half-upgraded -- one copy of my index is version 3.5.0, the other is
4.2.1.  Switching to SolrCloud with sharding and replication would
eliminate this flexibility, unless I maintained two separate clouds.

Thanks,
Shawn



[Announce] Apache Solr 4.1 with RankingAlgorithm 1.4.7 available now -- includes realtime-search with multiple granularities

2013-05-29 Thread Nagendra Nagarajayya
I am very excited to announce the availability of Solr 4.3 with 
RankingAlgorithm40 1.4.8 with realtime-search with multiple 
granularities. realtime-search is very fast NRT and allows you to not 
only lookup a document by id but also allows you to search in realtime, 
see http://tgels.org/realtime-nrt.jsp. The update performance is about 
70,000 docs / sec. The query performance is in ms, allows you to  query 
a 10m wikipedia index (complete index) in 50 ms.


This release includes realtime-search with multiple granularities, 
request/intra-request. The granularity attribute controls the NRT 
behavior. With attribute granularity=request, all search components 
like search, faceting, highlighting, etc. will see a consistent view of 
the index and will all report the same number of documents. With 
granularity=intrarequest, the components may each report the most 
recent changes to the index. realtime-search has been contributed back 
to Apache Solr, see https://issues.apache.org/jira/browse/SOLR-3816.


RankingAlgorithm 1.4.8 supports the entire Lucene Query Syntax, ± and/or 
boolean/dismax/glob/regular expression/wildcard/fuzzy/prefix/suffix 
queries with boosting, etc. and is compatible with the lucene 4.3 api.


You can get more information about realtime-search performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.3 with RankingAlgorithm40 1.4.8 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://elasticsearch-ra.tgels.org
http://rankingalgorithm.tgels.org

Note:
1. Apache Solr 4.1 with RankingAlgorithm40 1.4.7 is an external project.




Re: Advice : High-traffic web site

2013-05-29 Thread Shalin Shekhar Mangar
I don't see how multi-cores will help you. Both SolrCloud or Master-Slave
can work for you. Of course, SolrCloud helps you in terms of maintaining
higher availability due to replica/leader fail over.

If your queries are always going to be limited to one country then creating
a collection per country is fine.


On Wed, May 29, 2013 at 6:12 PM, Ramzi Alqrainy ramzi.alqra...@gmail.comwrote:

 Hi Team,

 Please I need your advice, I have high-traffic web site (100 million page
 views/month) to 22 country and I want to build fast and powerfull search
 engine. So, I use solr 4.3 and sperate every country to collection , but I
 want to build right structure to accommodates high traffic .So, What advise
 me to use? Solr cloud or Master-Slave or multi-cores .


 Thanks in advance.
 Ramzi,



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Advice-High-traffic-web-site-tp4066745.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Reindexing strategy

2013-05-29 Thread Shawn Heisey
On 5/29/2013 6:01 AM, Dotan Cohen wrote:
 I mean 'overload' Solr in the sense that it cannot read, process, and
 write data fast enough because too much data is being handled. I
 remind you that this system is writing hundreds of documents per
 minute. Certainly there is a limit to what Solr can handle. I ask how
 to know how close I am to this limit.

It's impossible for us to give you hard numbers.  You'll have to
experiment to know how fast you can reindex without killing your
servers.  A basic tenet for such experimentation, and something you
hopefully already know: You'll want to get baseline measurements before
you begin testing for comparison.

One of the most reliable Solr-specific indicators of pushing your
hardware too hard is that the QTime on your queries will start to
increase dramatically.  Solr 4.1 and later has more granular query time
statistics in the UI - the median and 95% numbers are much more
important than the average.

Outside of that, if your overall IOwait CPU percentage starts getting
near (or above) 30-50%, your server is struggling.  If all of your CPU
cores are staying near 100% usage, then it's REALLY struggling.

Assuming you have plenty of CPU cores, using fast storage and having
plenty of extra RAM will alleviate much of the I/O bottleneck.  The
usual rule of thumb for good query performance is that you need enough
RAM to put 50-100% of your index in the OS disk cache.  For blazing
performance during a rebuild, that becomes 100-200%.  If you had 150%,
that would probably keep most indexes well-cached even during a rebuild.

A rebuild will always lower performance, even with lots of RAM.

My earlier reply to your other message has some other ideas that will
hopefully help.

Thanks,
Shawn



Re: Replica shards not updating their index when update is sent to them

2013-05-29 Thread Sebastián Ramírez
I found how to solve the problem.

After sending a file to be indexed to a replica shard (node2):

curl 'http://node2:8983/solr/update?commit=true' -H 'Content-type:
text/xml' --data-binary 'adddocfield name=idasdf/fieldfield
name=contentbig moth/field/doc/add'

I can send a commit param to the same shard and then it gets updated:

curl 'http://node2:8983/solr/update?commit=true'


Another option is to send, from the beginning, a commitWithin param with
some milliseconds instead of a commit directly. That way, the commit
happens at most (the milliseconds specified) after, but the changes get
reflected in all shards, including the replica shard that received the
update request:

curl 
'http://node2:8983/solr/update?commitWithin=1http://node2:8983/solr/update?commit=true
'


As these emails get archived, I hope this may help someone in the future.

Sebastián Ramírez


On Mon, May 20, 2013 at 4:32 PM, Sebastián Ramírez 
sebastian.rami...@senseta.com wrote:

 Yes, It's happening with the latest version, 4.2.1

 Yes, it's easy to reproduce.
 It happened using 3 Virtual Machines and also happened using 3 physical
 nodes.


 Here are the details:

 I installed Hortonworks (a Hadoop distribution) in the 3 nodes. That
 installs Zookeeper.

 I used the example directory and copied it to the 3 nodes.

 I start Zookeeper in the 3 nodes.

 The first time, I run this command on each node, to start Solr:  java
 -jar -Dbootstrap_conf=true -DzkHost='node1,node2,node3'  start.jar

 As I understand, the -Dbootstrap_conf=true uploads the configuration to
 Zookeeper, so I don't need to do that the following times that I start each
 SolrCore.

 So, the following times, I run this on each node: java -jar
 -DzkHost='node0,node1,node2' start.jar

 Because I ran that command on node0 first, that node became the leader
 shard.

 I send an update to the leader shard, (in this case node0):
 I run curl 'http://node0:8983/solr/update?commit=true' -H 'Content-type:
 text/xml' --data-binary 'adddocfield name=idasdf/fieldfield
 name=contentbuggy/field/doc/add'

 When I query any shard I get the correct result:
 I run curl 'http://node0:8983/solr/select?q=id:asdf'
 or curl 'http://node1:8983/solr/select?q=id:asdf'
 or curl 'http://node2:8983/solr/select?q=id:asdf'
 (i.e. I send the query to each node), and then I get the expected response ...
 docstr name=idasdf/strarr name=content strbuggy/str /arr
 ... /doc...

 But when I send an update to a replica shard (node2) it is updated only in
 the leader shard (node0) and in the other replica (node1), not in the shard
 that received the update (node2):
 I send an update to the replica node2,
 I run curl 'http://node2:8983/solr/update?commit=true' -H 'Content-type:
 text/xml' --data-binary 'adddocfield name=idasdf/fieldfield
 name=contentbig moth/field/doc/add'

 Then I query each node and I receive the updated results only from the
 leader shard (node0) and the other replica shard (node1).

 I run (leader, node0):
 curl 'http://node0:8983/solr/select?q=id:asdf'
 And I get:
 ... docstr name=idasdf/strarr name=content strbig moth/str
 /arr ... /doc ...

 I run (other replica, node1):
 curl 'http://node1:8983/solr/select?q=id:asdf'
 And I get:
 ... docstr name=idasdf/strarr name=content strbig moth/str
 /arr ... /doc ...

 I run (first replica, the one that received the update, node2):
 curl 'http://node2:8983/solr/select?q=id:asdf'
 And I get (old result):
 ... docstr name=idasdf/strarr name=content strbuggy/str
 /arr ... /doc ...

 Thanks for your interest,

 Sebastián Ramírez


 On Mon, May 20, 2013 at 3:30 PM, Yonik Seeley yo...@lucidworks.comwrote:

 On Mon, May 20, 2013 at 4:21 PM, Sebastián Ramírez
 sebastian.rami...@senseta.com wrote:
  When I send an update to a non-leader (replica) shard (B), the updated
  results are reflected in the leader shard (A) and in the other replica
  shard (C), but not in the shard that received the update (B).

 I've never seen that before.  The replica that received the update
 isn't treated as special in any way by the code, so it's not clear how
 this could happen.

 What version of Solr is this (and does it happen with the latest
 version)?  How easy is this to reproduce for you?

 -Yonik
 http://lucidworks.com




-- 
**
*This e-mail transmission, including any attachments, is intended only for 
the named recipient(s) and may contain information that is privileged, 
confidential and/or exempt from disclosure under applicable law. If you 
have received this transmission in error, or are not the named 
recipient(s), please notify Senseta immediately by return e-mail and 
permanently delete this transmission, including any attachments.*


RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
Andy,

I opened this ticket so that someone can eventaully invistigate: 
https://issues.apache.org/jira/browse/SOLR-4874

Just an instanity check, I see I had misspelled maxCollations as 
maxCollation in my prior response.  When you tested with this set the same as 
maxCollationTries, did you correct my spelling?  The thought is that by 
requiring it to return this many collations back, you are guaranteed to make it 
try the maximum time every time,giving yourself a cleaner test.  I am trying to 
isolate here if spellcheck is not running the queries properly or if the 
queries just naturally take that long to run over and over again.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy Lester [mailto:a...@petdance.com] 
Sent: Tuesday, May 28, 2013 4:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

Thanks for looking at this.

 What are the QTimes for the 0fq,1fq,2fq,4fq  4fq cases with spellcheck 
 entirely turned off?  Is it about (or a little more than) half the total when 
 maxCollationTries=1 ?

With spellcheck off I get 8ms for 4fq query.


  Also, with the varying # of fq's, how many collation tries does it take to 
 get 10 collations?

I don't know.  How can I tell?


 Possibly, a better way to test this is to set maxCollations = 
 maxCollationTries.  The reason is that it quits trying once it finds 
 maxCollations, so if with 0fq's, lots of combinations can generate hits and 
 it doesn't need to try very many to get to 10.  But with more fq's, fewer 
 collations will pan out so now it is trying more up to 100 before (if ever) 
 it gets to 10.

It does just fine doing 100 collations so long as there are no FQs.  It seems 
to me that the FQs are taking an inordinate amount of extra time.  100 
collations in (roughly) the same amount of time as a single collation, so long 
as there are no FQs.  Why are the FQs such a drag on the collation process?


 (I'm assuming you have all non-search components like faceting turned off).

Yes, definitely.


  So say with 2fq's it takes 10ms for the query to complete with spellcheck 
 off, and 20ms with maxCollation = maxCollationTries = 1, then it will take 
 about 110ms with maxCollation = maxCollationTries = 10.

I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so 
long as I have FQs off.  Add a single FQ and it becomes 13499ms.

I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so 
long as I have FQs off.  Add a single FQ and it becomes 62038ms.


 But I think you're just setting maxCollationTries too high.  You're asking it 
 to do too much work in trying teens of combinations.

The results I get back with 100 tries are about twice as many as I get with 10 
tries.  That's a big difference to the user where it's trying to figure 
misspelled phrases.

Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance





Escaping character at Query

2013-05-29 Thread Furkan KAMACI
I use Solr 4.2.1 and I analyze that keyword:

keliledimle

at admin page:

WT

keliledimle

SF

keliledimle

TLCF

keliledimle

However when I escape that charter and search it:

solr/select?q=kelile\dimle

here is what I see:

response
lst name=responseHeader
int name=status0/int
int name=QTime148/int
 lst name=params
  str name=dimle/
  *str name=qkelile\/str*
 /lst
/lst

I have edismax as default query parser. How can I escape that 
character, why it doesn't like that?:

str name=qkelile\dimle/str

Any ideas?


RE: Choosing specific fields for suggestions in SpellCheckerComponent

2013-05-29 Thread Dyer, James
I assume here you've got a spellcheck field like this:

field name=Spelling_Dictionary type=text_general/
copyField source=field1 dest=Spelling_Dictionary /
copyField source=field2 dest=Spelling_Dictionary /
copyField source=field3 dest=Spelling_Dictionary /
copyField source=field4 dest=Spelling_Dictionary /

...so that a check against Spelling_Dictionary always checks all 4, right?  
This is the only way I know to approximate having it spellcheck across multiple 
fields.  And as you have found, short of creating several separate versions of 
Spelling_Dictionary, there is no way to specify the individual fields a la 
carte.  Although not supported, some of the work was done as part of SOLR-2993.

Your best bet now is to use Spelling_Dictionary as a master dictionary then 
use maxCollationTries to have it generate collations that only pertain the 
what the user actually searched against.  This is less efficient and may not 
work well (or at all) with Suggest.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Wilson Passos [mailto:wrpas...@gmail.com] 
Sent: Tuesday, May 28, 2013 11:54 PM
To: Solr User List
Subject: Choosing specific fields for suggestions in SpellCheckerComponent

Hi everyone,


I've been searching about how to configure the SpellCheckerComponent in 
Solr 4.0 to support suggestion queries based on s subset of the 
configured fields in schema.xml. Let's say the spell checking is 
configured to use these 4 fields:

field name=field1 type=text_general/
field name=field2 type=text_general/
field name=field3 type=text_general/
field name=field4 type=text_general/

I'd like to know if there's any possibility to dynamically set the 
SpellCheckerComponent to suggest terms using just fields field2 and 
field3 instead of the default behavior, which always includes 
suggestions across the 4 defined fields.

Thanks in advance for any help!




Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Andy Lester

On May 29, 2013, at 9:46 AM, Dyer, James james.d...@ingramcontent.com wrote:

 Just an instanity check, I see I had misspelled maxCollations as 
 maxCollation in my prior response.  When you tested with this set the same 
 as maxCollationTries, did you correct my spelling?

Yes, definitely.

Thanks for the ticket.  I am looking at the effects of turning on 
spellcheck.onlyMorePopular to true, which reduces the number of collations it 
seems to do, but doesn't affect the underlying question of is the spellchecker 
doing FQs properly?

Thanks,
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Escaping character at Query

2013-05-29 Thread Carlos Bonilla
Hi,
try with double quotation marks ( ).

Carlos.


2013/5/29 Furkan KAMACI furkankam...@gmail.com

 I use Solr 4.2.1 and I analyze that keyword:

 keliledimle

 at admin page:

 WT

 keliledimle

 SF

 keliledimle

 TLCF

 keliledimle

 However when I escape that charter and search it:

 solr/select?q=kelile\dimle

 here is what I see:

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime148/int
  lst name=params
   str name=dimle/
   *str name=qkelile\/str*
  /lst
 /lst

 I have edismax as default query parser. How can I escape that 
 character, why it doesn't like that?:

 str name=qkelile\dimle/str

 Any ideas?



using HTTP caching with shards in Solr 4.3

2013-05-29 Thread Ty
Hello,

I'd like to take advantage of Solr's HTTP caching feature (httpCaching
never304=false in solrconfig.xml)..  It is behaving as expected when I do
a standard query against a Solr instance and then repeat it: I receive an
HTTP304 (not modified) response.

However, when using the shards functionality, I seem to be unable to get
the HTTP304 functionality.  When sending a request to a Solr instance that
includes other Solr instances in the shards parameter, a GET request is
sent to the original Solr instance, but it turns around and sends POST
requests to the Solr instances referenced in shards.  Since POST requests
cannot generate a 304, I seem to be unable to use HTTP caching with shards.

Is there a way to make the original Solr instance query the shards with a
GET method?  Or some other way I can leverage HTTP caching when using
shards?

Thanks,
Ty


[Announce] Apache Solr 4.3 with RankingAlgorithm 1.4.8 available now -- includes realtime-search with multiple granularities (correction)

2013-05-29 Thread Nagendra Nagarajayya
I am very excited to announce the availability of Solr 4.3 with 
RankingAlgorithm40 1.4.8 with realtime-search with multiple 
granularities. realtime-search is very fast NRT and allows you to not 
only lookup a document by id but also allows you to search in realtime, 
see http://tgels.org/realtime-nrt.jsp. The update performance is about 
70,000 docs / sec. The query performance is in ms, allows you to  query 
a 10m wikipedia index (complete index) in 50 ms.


This release includes realtime-search with multiple granularities, 
request/intra-request. The granularity attribute controls the NRT 
behavior. With attribute granularity=request, all search components 
like search, faceting, highlighting, etc. will see a consistent view of 
the index and will all report the same number of documents. With 
granularity=intrarequest, the components may each report the most 
recent changes to the index. realtime-search has been contributed back 
to Apache Solr, see https://issues.apache.org/jira/browse/SOLR-3816.


RankingAlgorithm 1.4.8 supports the entire Lucene Query Syntax, ± and/or 
boolean/dismax/glob/regular expression/wildcard/fuzzy/prefix/suffix 
queries with boosting, etc. and is compatible with the lucene 4.3 api.


You can get more information about realtime-search performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.3 with RankingAlgorithm40 1.4.8 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://elasticsearch-ra.tgels.org
http://rankingalgorithm.tgels.org

Note:
1. Apache Solr 4.3 with RankingAlgorithm40 1.4.8 is an external project.






Re: Escaping character at Query

2013-05-29 Thread Furkan KAMACI
When I write:

solr/select?q=kelile\dimle

it still says:

lst name=params
str name=dimle/
*str name=qkelile\/str*
/lst



2013/5/29 Carlos Bonilla carlosbonill...@gmail.com

 Hi,
 try with double quotation marks ( ).

 Carlos.


 2013/5/29 Furkan KAMACI furkankam...@gmail.com

  I use Solr 4.2.1 and I analyze that keyword:
 
  keliledimle
 
  at admin page:
 
  WT
 
  keliledimle
 
  SF
 
  keliledimle
 
  TLCF
 
  keliledimle
 
  However when I escape that charter and search it:
 
  solr/select?q=kelile\dimle
 
  here is what I see:
 
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime148/int
   lst name=params
str name=dimle/
*str name=qkelile\/str*
   /lst
  /lst
 
  I have edismax as default query parser. How can I escape that 
  character, why it doesn't like that?:
 
  str name=qkelile\dimle/str
 
  Any ideas?
 



Re: Escaping character at Query

2013-05-29 Thread Jack Krupansky

You need to UUEncode the  with %26:

...solr/select?q=kelile%26dimle

Normally,  introduces a new URL query parameter in the URL.

-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI 
Sent: Wednesday, May 29, 2013 10:55 AM 
To: solr-user@lucene.apache.org 
Subject: Escaping  character at Query 


I use Solr 4.2.1 and I analyze that keyword:

keliledimle

at admin page:

WT

keliledimle

SF

keliledimle

TLCF

keliledimle

However when I escape that charter and search it:

solr/select?q=kelile\dimle

here is what I see:

response
lst name=responseHeader
int name=status0/int
int name=QTime148/int
lst name=params
 str name=dimle/
 *str name=qkelile\/str*
/lst
/lst

I have edismax as default query parser. How can I escape that 
character, why it doesn't like that?:

str name=qkelile\dimle/str

Any ideas?


Re: Escaping character at Query

2013-05-29 Thread Carlos Bonilla
Hi, I meant:

solr/select?q=keliledimle

Cheers.



2013/5/29 Jack Krupansky j...@basetechnology.com

 You need to UUEncode the  with %26:

 ...solr/select?q=kelile%**26dimle

 Normally,  introduces a new URL query parameter in the URL.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI Sent: Wednesday, May 29,
 2013 10:55 AM To: solr-user@lucene.apache.org Subject: Escaping 
 character at Query
 I use Solr 4.2.1 and I analyze that keyword:

 keliledimle

 at admin page:

 WT

 keliledimle

 SF

 keliledimle

 TLCF

 keliledimle

 However when I escape that charter and search it:

 solr/select?q=kelile\dimle

 here is what I see:

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime148/int
 lst name=params
  str name=dimle/
  *str name=qkelile\/str*
 /lst
 /lst

 I have edismax as default query parser. How can I escape that 
 character, why it doesn't like that?:

 str name=qkelile\dimle/str

 Any ideas?



Re: Problem with xpath expression in data-config.xml

2013-05-29 Thread Shalin Shekhar Mangar
On Wed, May 29, 2013 at 6:05 PM, Hans-Peter Stricker
stric...@epublius.dewrote:

 Replacing the contents of solr-4.3.0\example\example-**
 DIH\solr\rss\conf\rss-data-**config.xml

 by

 dataConfig
dataSource type=URLDataSource /
document
entity name=beautybooks88  pk=link url=http://beautybooks88.*
 *blogspot.com/feeds/posts/**defaulthttp://beautybooks88.blogspot.com/feeds/posts/default
 processor=**XPathEntityProcessor forEach=/feed/entry transformer=**
 DateFormatTransformer
 field column=source xpath=/feed/title
 commonField=true /
 field column=source-link
 xpath=/feed/link[@rel='self']**/@href commonField=true /

 field column=title xpath=/feed/entry/title /
 field column=link xpath=/feed/entry/link[@rel='
 **self']/@href /
 field column=description
 xpath=/feed/entry/content stripHTML=true/
 field column=creator xpath=/feed/entry/author
 /
 field column=item-subject
 xpath=/feed/entry/category/@**term/
 field column=date xpath=/feed/entry/updated
 dateTimeFormat=-MM-dd'T'**HH:mm:ss /
 /entity
/document
 /dataConfig

 and running the full dataimport from http://localhost:8983/solr/#/**
 rss/dataimport//dataimporthttp://localhost:8983/solr/#/rss/dataimport//dataimportresults
  in an error.

 1) How could I have found the reason faster than I did - by looking into
 which log files,?


DIH uses the same log file as solr. The name/location of the log file
depends on your logging configuration.


 2) If you remove the first occurrence of /@href above, the import
 succeeds. (Note that the same pattern works for column link.) What's the
 reason why?!!


I think there is a bug here. In my tests, xpath=/root/a/@y
works, xpath=/root/a[@x='1']/@y also works. But if you use them together
the one which is defined last returns null. I'll open an issue.


-- 
Regards,
Shalin Shekhar Mangar.


Re: Escaping character at Query

2013-05-29 Thread Jack Krupansky

So, make it:

solr/select?q=kelile%26dimle

-- Jack Krupansky

-Original Message- 
From: Carlos Bonilla 
Sent: Wednesday, May 29, 2013 11:39 AM 
To: solr-user@lucene.apache.org 
Subject: Re: Escaping  character at Query 


Hi, I meant:

solr/select?q=keliledimle

Cheers.



2013/5/29 Jack Krupansky j...@basetechnology.com


You need to UUEncode the  with %26:

...solr/select?q=kelile%**26dimle

Normally,  introduces a new URL query parameter in the URL.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI Sent: Wednesday, May 29,
2013 10:55 AM To: solr-user@lucene.apache.org Subject: Escaping 
character at Query
I use Solr 4.2.1 and I analyze that keyword:

keliledimle

at admin page:

WT

keliledimle

SF

keliledimle

TLCF

keliledimle

However when I escape that charter and search it:

solr/select?q=kelile\dimle

here is what I see:

response
lst name=responseHeader
int name=status0/int
int name=QTime148/int
lst name=params
 str name=dimle/
 *str name=qkelile\/str*
/lst
/lst

I have edismax as default query parser. How can I escape that 
character, why it doesn't like that?:

str name=qkelile\dimle/str

Any ideas?



Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Nicholas Fellows
I also have problems getting the solrspellchecker to utilise existing FQ
params correctly.
we have some fairly monster queries

eg : http://pastebin.com/4XzGpfeC

I cannot seem to get our FQ parameters to be honored when generating
results.
In essence i am getting collations that yield no results when the filter
query is applied.

We have items that are by default not shown when out of stock or
forthcoming. the user
can select whether to show these or not.

Is there something wrong with my query or perhaps my use case is not
supported?

Im using nested query and local params etc

Would very much appreciate some assistance on this one as 2days worth of
hacking, and pestering
people on IRC have not yet yeilded a solution for me. Im not even sure what
i am trying
is even possible! Some sort of clarification on this would really help!

Cheers

Nick...




On 29 May 2013 15:57, Andy Lester a...@petdance.com wrote:


 On May 29, 2013, at 9:46 AM, Dyer, James james.d...@ingramcontent.com
 wrote:

  Just an instanity check, I see I had misspelled maxCollations as
 maxCollation in my prior response.  When you tested with this set the
 same as maxCollationTries, did you correct my spelling?

 Yes, definitely.

 Thanks for the ticket.  I am looking at the effects of turning on
 spellcheck.onlyMorePopular to true, which reduces the number of collations
 it seems to do, but doesn't affect the underlying question of is the
 spellchecker doing FQs properly?

 Thanks,
 Andy

 --
 Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance




-- 
Nick Fellows
DJdownload.com
---
10 Greenland Street
London
NW10ND
United Kingdom
---
n...@djdownload.com (E)

---
www.djdownload.com


Re: Not able to search Spanish word with ascent in solr

2013-05-29 Thread jignesh
Solr returning error 500, when i post data with ascent chars...

Any solution for that?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4066808.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: error while indexing huge filesystem with data import handler and FileListEntityProcessor

2013-05-29 Thread jerome . dupont


The configuraiton works with LineEntityProcessor, with few documents (havn
(t test with many documents yet.
For information this the config
dataConfig
dataSource name=myfilelist baseUrl=file:///D:/jed/noticesBib/
type=URLDataSource encoding=UTF-8 /

document

!-- config avec fichier contenant la liste des xml a
ouvrir. --
entity name=noticebib
datasource=myfilelist
processor=LineEntityProcessor
acceptLineRegex=^.*\.xml$
url=listeNotices.txt
rootEntity=false
transformer=LogTransformer 
logTemplate=In
entity noticebib logLevel=debug


entity  name=processorDocument
processor=XPathEntityProcessor
url=file:///D:/${noticebib.rawLine}
xsl=xslt/mnb/IXM_MNb.xsl
forEach=/record

transformer=fr.bnf.solr.BnfDateTransformer,LogTransformer
logTemplate=In entity processorDocument fichier: file:///D:/$
{noticebib.rawLine} logLevel=debug

... fields defintion

file:///D:/jed/noticesBib/listeNotices.txt contains the follwing lines
jed/noticesBib/3/4/307/34307035.xml
jed/noticesBib/3/4/307/34307082.xml
jed/noticesBib/3/4/307/34307110.xml
jed/noticesBib/3/4/307/34307197.xml
jed/noticesBib/3/4/307/34307350.xml
jed/noticesBib/3/4/307/34307399.xml
...
(Could have containes all the location with the beginning, but I wanted to
test the concatenation of filename.

That works fine, thanks for the help!!

Next step, the same without using a file. (I'll write it in another post).

Regards,
Jérôme

Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement. 

Re: Problem with xpath expression in data-config.xml

2013-05-29 Thread Shalin Shekhar Mangar
I created https://issues.apache.org/jira/browse/SOLR-4875


On Wed, May 29, 2013 at 9:15 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:


 On Wed, May 29, 2013 at 6:05 PM, Hans-Peter Stricker stric...@epublius.de
  wrote:

 Replacing the contents of solr-4.3.0\example\example-**
 DIH\solr\rss\conf\rss-data-**config.xml

 by

 dataConfig
dataSource type=URLDataSource /
document
entity name=beautybooks88  pk=link url=http://beautybooks88.
 **blogspot.com/feeds/posts/**defaulthttp://beautybooks88.blogspot.com/feeds/posts/default
 processor=**XPathEntityProcessor forEach=/feed/entry transformer=**
 DateFormatTransformer
 field column=source xpath=/feed/title
 commonField=true /
 field column=source-link
 xpath=/feed/link[@rel='self']**/@href commonField=true /

 field column=title xpath=/feed/entry/title /
 field column=link
 xpath=/feed/entry/link[@rel='**self']/@href /
 field column=description
 xpath=/feed/entry/content stripHTML=true/
 field column=creator
 xpath=/feed/entry/author /
 field column=item-subject
 xpath=/feed/entry/category/@**term/
 field column=date xpath=/feed/entry/updated
 dateTimeFormat=-MM-dd'T'**HH:mm:ss /
 /entity
/document
 /dataConfig

 and running the full dataimport from http://localhost:8983/solr/#/**
 rss/dataimport//dataimporthttp://localhost:8983/solr/#/rss/dataimport//dataimportresults
  in an error.

 1) How could I have found the reason faster than I did - by looking into
 which log files,?


 DIH uses the same log file as solr. The name/location of the log file
 depends on your logging configuration.


 2) If you remove the first occurrence of /@href above, the import
 succeeds. (Note that the same pattern works for column link.) What's the
 reason why?!!


 I think there is a bug here. In my tests, xpath=/root/a/@y
 works, xpath=/root/a[@x='1']/@y also works. But if you use them together
 the one which is defined last returns null. I'll open an issue.


 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Not able to search Spanish word with ascent in solr

2013-05-29 Thread Gora Mohanty
On 29 May 2013 21:39, jignesh js.vishava...@gmail.com wrote:
 Solr returning error 500, when i post data with ascent chars...

 Any solution for that?
[...]

Please look in the Solr logs for the
appropriate error message.

Regards,
Gora


Solr Cloud Using Zookeeper SASL

2013-05-29 Thread Don Tran
Hiya all,

Got a question that I hope someone can help me with.
I was just wondering if anyone has ever used Solr Cloud using Zookeepers that 
have SASL authentication turned on?
I can't seem to find any documentation on it so any help at all would be 
amazing!

Thanks,


Don Tran
Developer
Omnifone
Island Studios
47 British Grove
London W4 2NL, UK
T: +44 (0)20 8600 0580
F: +44 (0)20 8600 0581
S:  DonTranOmnifone
E:  dt...@omnifone.commailto:dt...@omnifone.com


__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__

RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
Instead of maxCollationTries=0, use a value greater than zero.  Zero means 
not to check if the collation will return hits.  1 means to test 1 possible 
combination against the index and return it only if it returns hits.  2 tries 
up to 2 possibilities, etc.  As you have spellcheck.maxCollations=8, you'll 
probably want maxCollationTries at least that large.  Maybe 10-20 would be 
better.  Make it as low as possible to get generally good results, or as high 
as possible before the performance on a query with many misspelled words gets 
too bad.

Also, use a spellcheck.count greater than 2.  This is as many corrections per 
misspelled term you want it to consider.  If using DirectSolrSpellChecker, you 
can have it set low, 5-10 might be good.  If using IndexBased- or FileBased 
spell checkers, use at least 10.

Also, do not use onlyMorePopular unless you indeed want every term in the 
user's query to be replaced with higher-frequency terms (even correctly-spelled 
terms get replaced).  If you want it to suggest even for words that are in the 
dictionary, try spellcheck.alternativeTermCount instead.  Try setting it to 
about half of spellcheck.count (but at least 10 if using IndexBased- or 
FileBased spell checkers).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nicholas Fellows [mailto:n...@djdownload.com] 
Sent: Wednesday, May 29, 2013 11:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

I also have problems getting the solrspellchecker to utilise existing FQ
params correctly.
we have some fairly monster queries

eg : http://pastebin.com/4XzGpfeC

I cannot seem to get our FQ parameters to be honored when generating
results.
In essence i am getting collations that yield no results when the filter
query is applied.

We have items that are by default not shown when out of stock or
forthcoming. the user
can select whether to show these or not.

Is there something wrong with my query or perhaps my use case is not
supported?

Im using nested query and local params etc

Would very much appreciate some assistance on this one as 2days worth of
hacking, and pestering
people on IRC have not yet yeilded a solution for me. Im not even sure what
i am trying
is even possible! Some sort of clarification on this would really help!

Cheers

Nick...




On 29 May 2013 15:57, Andy Lester a...@petdance.com wrote:


 On May 29, 2013, at 9:46 AM, Dyer, James james.d...@ingramcontent.com
 wrote:

  Just an instanity check, I see I had misspelled maxCollations as
 maxCollation in my prior response.  When you tested with this set the
 same as maxCollationTries, did you correct my spelling?

 Yes, definitely.

 Thanks for the ticket.  I am looking at the effects of turning on
 spellcheck.onlyMorePopular to true, which reduces the number of collations
 it seems to do, but doesn't affect the underlying question of is the
 spellchecker doing FQs properly?

 Thanks,
 Andy

 --
 Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance




-- 
Nick Fellows
DJdownload.com
---
10 Greenland Street
London
NW10ND
United Kingdom
---
n...@djdownload.com (E)

---
www.djdownload.com



Re: Not able to search Spanish word with ascent in solr

2013-05-29 Thread Raymond Wiker
On May 29, 2013, at 18:09 , jignesh js.vishava...@gmail.com wrote:
 Solr returning error 500, when i post data with ascent chars...
 
 Any solution for that?

The solution probably involves using the correct encoding, and ensuring that 
the HTTP request sets the appropriate header values accordingly.

In other words, more likely a pilot error than a SOLR error... at least that 
was the case for me :-)

Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Shalin Shekhar Mangar
James, this is very useful information. Can you please add this to the wiki?


On Wed, May 29, 2013 at 10:36 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 Instead of maxCollationTries=0, use a value greater than zero.  Zero
 means not to check if the collation will return hits.  1 means to test 1
 possible combination against the index and return it only if it returns
 hits.  2 tries up to 2 possibilities, etc.  As you have
 spellcheck.maxCollations=8, you'll probably want maxCollationTries at
 least that large.  Maybe 10-20 would be better.  Make it as low as possible
 to get generally good results, or as high as possible before the
 performance on a query with many misspelled words gets too bad.

 Also, use a spellcheck.count greater than 2.  This is as many corrections
 per misspelled term you want it to consider.  If using
 DirectSolrSpellChecker, you can have it set low, 5-10 might be good.  If
 using IndexBased- or FileBased spell checkers, use at least 10.

 Also, do not use onlyMorePopular unless you indeed want every term in
 the user's query to be replaced with higher-frequency terms (even
 correctly-spelled terms get replaced).  If you want it to suggest even for
 words that are in the dictionary, try spellcheck.alternativeTermCount
 instead.  Try setting it to about half of spellcheck.count (but at least
 10 if using IndexBased- or FileBased spell checkers).

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Nicholas Fellows [mailto:n...@djdownload.com]
 Sent: Wednesday, May 29, 2013 11:06 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Why do FQs make my spelling suggestions so slow?

 I also have problems getting the solrspellchecker to utilise existing FQ
 params correctly.
 we have some fairly monster queries

 eg : http://pastebin.com/4XzGpfeC

 I cannot seem to get our FQ parameters to be honored when generating
 results.
 In essence i am getting collations that yield no results when the filter
 query is applied.

 We have items that are by default not shown when out of stock or
 forthcoming. the user
 can select whether to show these or not.

 Is there something wrong with my query or perhaps my use case is not
 supported?

 Im using nested query and local params etc

 Would very much appreciate some assistance on this one as 2days worth of
 hacking, and pestering
 people on IRC have not yet yeilded a solution for me. Im not even sure what
 i am trying
 is even possible! Some sort of clarification on this would really help!

 Cheers

 Nick...




 On 29 May 2013 15:57, Andy Lester a...@petdance.com wrote:

 
  On May 29, 2013, at 9:46 AM, Dyer, James james.d...@ingramcontent.com
 
  wrote:
 
   Just an instanity check, I see I had misspelled maxCollations as
  maxCollation in my prior response.  When you tested with this set the
  same as maxCollationTries, did you correct my spelling?
 
  Yes, definitely.
 
  Thanks for the ticket.  I am looking at the effects of turning on
  spellcheck.onlyMorePopular to true, which reduces the number of
 collations
  it seems to do, but doesn't affect the underlying question of is the
  spellchecker doing FQs properly?
 
  Thanks,
  Andy
 
  --
  Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
 
 


 --
 Nick Fellows
 DJdownload.com
 ---
 10 Greenland Street
 London
 NW10ND
 United Kingdom
 ---
 n...@djdownload.com (E)

 ---
 www.djdownload.com




-- 
Regards,
Shalin Shekhar Mangar.


RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
It has been in the wiki, more or less.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and following 
sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, May 29, 2013 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

James, this is very useful information. Can you please add this to the wiki?


On Wed, May 29, 2013 at 10:36 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 Instead of maxCollationTries=0, use a value greater than zero.  Zero
 means not to check if the collation will return hits.  1 means to test 1
 possible combination against the index and return it only if it returns
 hits.  2 tries up to 2 possibilities, etc.  As you have
 spellcheck.maxCollations=8, you'll probably want maxCollationTries at
 least that large.  Maybe 10-20 would be better.  Make it as low as possible
 to get generally good results, or as high as possible before the
 performance on a query with many misspelled words gets too bad.

 Also, use a spellcheck.count greater than 2.  This is as many corrections
 per misspelled term you want it to consider.  If using
 DirectSolrSpellChecker, you can have it set low, 5-10 might be good.  If
 using IndexBased- or FileBased spell checkers, use at least 10.

 Also, do not use onlyMorePopular unless you indeed want every term in
 the user's query to be replaced with higher-frequency terms (even
 correctly-spelled terms get replaced).  If you want it to suggest even for
 words that are in the dictionary, try spellcheck.alternativeTermCount
 instead.  Try setting it to about half of spellcheck.count (but at least
 10 if using IndexBased- or FileBased spell checkers).

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Nicholas Fellows [mailto:n...@djdownload.com]
 Sent: Wednesday, May 29, 2013 11:06 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Why do FQs make my spelling suggestions so slow?

 I also have problems getting the solrspellchecker to utilise existing FQ
 params correctly.
 we have some fairly monster queries

 eg : http://pastebin.com/4XzGpfeC

 I cannot seem to get our FQ parameters to be honored when generating
 results.
 In essence i am getting collations that yield no results when the filter
 query is applied.

 We have items that are by default not shown when out of stock or
 forthcoming. the user
 can select whether to show these or not.

 Is there something wrong with my query or perhaps my use case is not
 supported?

 Im using nested query and local params etc

 Would very much appreciate some assistance on this one as 2days worth of
 hacking, and pestering
 people on IRC have not yet yeilded a solution for me. Im not even sure what
 i am trying
 is even possible! Some sort of clarification on this would really help!

 Cheers

 Nick...




 On 29 May 2013 15:57, Andy Lester a...@petdance.com wrote:

 
  On May 29, 2013, at 9:46 AM, Dyer, James james.d...@ingramcontent.com
 
  wrote:
 
   Just an instanity check, I see I had misspelled maxCollations as
  maxCollation in my prior response.  When you tested with this set the
  same as maxCollationTries, did you correct my spelling?
 
  Yes, definitely.
 
  Thanks for the ticket.  I am looking at the effects of turning on
  spellcheck.onlyMorePopular to true, which reduces the number of
 collations
  it seems to do, but doesn't affect the underlying question of is the
  spellchecker doing FQs properly?
 
  Thanks,
  Andy
 
  --
  Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
 
 


 --
 Nick Fellows
 DJdownload.com
 ---
 10 Greenland Street
 London
 NW10ND
 United Kingdom
 ---
 n...@djdownload.com (E)

 ---
 www.djdownload.com




-- 
Regards,
Shalin Shekhar Mangar.


Seeming bug in ConcurrentUpdateSolrServer

2013-05-29 Thread Benson Margulies
The comment here is clearly wrong, since there is no division by two.

I think that the code is wrong, because this results in not starting
runners when it should start runners. Am I misanalyzing?

if (runners.isEmpty() || (queue.remainingCapacity()  queue.size() // queue

  // is

  // half

  // full

  // and

  // we

  // can

  // add

  // more

  // runners
   runners.size()  threadCount)) {


Re: Indexing Solr, Multiple Doc Types. Production of Multiple Values for UniqueKey Field Using TemplateTransformer

2013-05-29 Thread Chris Hostetter

: org.apache.solr.common.SolrException: Document contains multiple values for
: uniqueKey field: uid=[A_1, dc1999fcf12df900]

By the looks of things, your TemplateTransformer is properly creating a 
value of A_${atest.id} where ${atest.id} == 1 for that document ... 
the problem seems to be that somehow another value is getting put in your 
uid field containing dc1999fcf12df900

Based on your stack trace, i suspect that in addition to having DIH 
create a value for your uid field, you also have 
SignatureUpdateProcessorFactory configured (in your solrconfig.xml) to 
generate a synthetic unique id based on the signature of some fields as 
well...

: 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory$SignatureUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:194)


-Hoss


Re: Seeming bug in ConcurrentUpdateSolrServer

2013-05-29 Thread Shalin Shekhar Mangar
On Wed, May 29, 2013 at 11:29 PM, Benson Margulies bimargul...@gmail.comwrote:

 The comment here is clearly wrong, since there is no division by two.

 I think that the code is wrong, because this results in not starting
 runners when it should start runners. Am I misanalyzing?

 if (runners.isEmpty() || (queue.remainingCapacity()  queue.size() // queue

   // is

   // half

   // full

   // and

   // we

   // can

   // add

   // more

   // runners
runners.size()  threadCount)) {



queue.remainingCapacity() returns capacity - queue.size() so the comment is
correct.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Seeming bug in ConcurrentUpdateSolrServer

2013-05-29 Thread Benson Margulies
Ah. So now I have to find some other explanation of why it never
creates more than one thread, even when I make a very deep queue and
specify 6 threads.

On Wed, May 29, 2013 at 2:25 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Wed, May 29, 2013 at 11:29 PM, Benson Margulies 
 bimargul...@gmail.comwrote:

 The comment here is clearly wrong, since there is no division by two.

 I think that the code is wrong, because this results in not starting
 runners when it should start runners. Am I misanalyzing?

 if (runners.isEmpty() || (queue.remainingCapacity()  queue.size() // queue

   // is

   // half

   // full

   // and

   // we

   // can

   // add

   // more

   // runners
runners.size()  threadCount)) {



 queue.remainingCapacity() returns capacity - queue.size() so the comment is
 correct.

 --
 Regards,
 Shalin Shekhar Mangar.


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread bbarani
Hoss, for some reason this doesn't work when I pass the latlong value via
query..

This is the query.. It just returns all the values for fname='peter'
(doesn't filter for Tarmac, Florida).

fl=*,scorerows=10qt=findpersonfps_latlong=26.22084,-80.29fps_fname=peter

*solrconfig.xml*

lst name=appends
str name=fq{!switch case='*:*' default=$fq_bbox
v=$fps_latlong}/str
/lst
lst name=invariants
str name=fq_bbox_query_:{!bbox pt=$fps_latlong sfield=geo
d=$fps_dist}/str
/lst

*Works when used via custom component:*

This works fine when the latlong value is passed via custom component. We
have a custom component which gets the location name via query, calculates
the corresponding lat long co-ordinates stored in TSV file and passes the
co-ordinates to the query.


*Custom component config:*

 searchComponent name=geo class=com.customcomponent
str name=placenameFilecentroids.tsv/str
str name=placenameQueryParamfps_where/str
str name=latQueryParamfps_latitude/str
str name=lonQueryParamfps_longitude/str
str name=latlonQueryParamfps_latlong/str
str name=distQueryParamfps_dist/str
float name=defaultDist48.2803/float
float name=boost1.0/float
  /searchComponent

*Custom component query:*
fl=*,scorerows=10*fps_where=new york,
ny*qt=findpersonfps_latlong=26.22084,-80.29fps_dist=.10fps_fname=peter

Is it a bug?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066862.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread Chris Hostetter

: Hoss, for some reason this doesn't work when I pass the latlong value via
: query..
...
: fl=*,scorerows=10qt=findpersonfps_latlong=26.22084,-80.29fps_fname=peter

Hmmm, are these appends  invariants on your findperson requestHandler?

What does debugQuery=true show you the pplied filters are?

: lst name=invariants
: str name=fq_bbox_query_:{!bbox pt=$fps_latlong sfield=geo
: d=$fps_dist}/str
: /lst

Why do you have the _query_ hack in there?  i haven't had a chance to test 
this, but perhaps that hack doesn't play nicely with localparam variable 
substitution? it should just be...

   str name=fq_bbox{!bbox pt=$fps_latlong sfield=geo d=$fps_dist}/str

: This works fine when the latlong value is passed via custom component. We
: have a custom component which gets the location name via query, calculates
: the corresponding lat long co-ordinates stored in TSV file and passes the
: co-ordinates to the query.


Ok wait a minute -- all bets are off about this working if you have a 
custom component in the mix adding/removing params.  you need to provide 
us with more details about exactly how your component works, where it's 
configured in the component list, and how it is adding the fps_latlong 
param it generates to the query, becuase my guesses are one of two things 
are happening:

1) your component is doing it's logic after the query parsing has already 
happened and the variables have been evaluated -- at which point 
fps_latlong isn't set yet, so you get the case='*:*' behavior

2) your component is doing it's logic before the query parsing happens, 
but it is setting the value of fps_latlong in a way that the query parsing 
code doens't see it hen resolving the local variables.


-Hoss


Problem with PatternReplaceCharFilter

2013-05-29 Thread jasimop
Hi,

I have a Problem when using PatternReplaceCharFilter when indexing a field.
I created the following field: 
fieldType name=testfield class=solr.TextField
  analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=#60;TextDocument[^#62;]*#62; replacement=/
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=#60;/TextDocument#62; replacement=/--
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=#60;TextLine[^#60;]+ content=\#34;([^\#34;]*)\#34;[^/]+/#62;
replacement=$1 /
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt  format=snowball
enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

And I created a field that is indexed and stored:
field name=testfield type=testfield indexed=true stored=true /

I need to index a document with such a structure in this field:
TextDocument filename=somefile.end mime=... created=...TextLine
aa=bb cc=dd content=the content to search in ee=ff /TextLine
aa=bb cc=dd content=the second content line ee=ff //TextDocument

Basically I have some sort of XML structure, i need only to search in the
content attribute, but when highlighting i need to get back to the
enclosing XML tags.

So with the 3 Regex I want to remove all unwanted tags and tokenize/index
only the important data.
I know that I could use HTMLStripCharFilterFactory but then also the tag
names, attribute names and values get indexed. And I don't want to search in
that content too.

I read the following in the doc:
NOTE: If you produce a phrase that has different length to source string and
the field is used for highlighting for a term of the phrase, you will face a
trouble. 

The thing is, why is this the case? When running the analyze from solr admin
the CharFilters generate
the content to search in the second content line which looks perfect, but
then the StandardTokenizer
gets the start and end positions of the tokens wrong. Why is this the case?
Does there exist another solution to my problem?
Could I use the following method I saw in the doc of
PatternReplaceCharFilter:
protected int correct(int currentOff) Documentation: Retrieve the corrected
offset.

How could I solve such a task?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Support for Mongolian language

2013-05-29 Thread Sagar Chaturvedi
Hi All,

Does solr provide support for Mongolian language?

Also which filters and tokenizers must be used for Chinese, Japanese and Korean 
languages?

Regards,
Sagar Chaturvedi


DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread bbarani
Ok..I removed all my custom components from findperson request handler..

  requestHandler name=findperson class=solr.SearchHandler
default=false
lst name=defaults
  str name=defTypelucene/str
  str name=echoParamsexplicit/str
  int name=rows10/int
  str name=q.opAND/str
  str name=qfperson_name_all_i/str
  int name=score_truncation_cliff50/int
  int name=fps_dist32/int
  str name=q
  *:*
  /str
  lst name=appends
str name=fq{!switch case='*:*' default=$fq_bbox
v=$fps_latlong}/str
/lst
lst name=invariants
str name=fq_bbox_query_:{!bbox pt=$fps_latlong sfield=geo
d=$fps_dist}/str
/lst
/lst
arr name=components
strquery/str
  strdebug/str
/arr
  /requestHandler


My query:
select?fl=*,scorerows=10qt=findpersonfps_latlong=42.3482,-75.1890

The above query just returns everything back from SOLR (should only return
results corresponding to lat and long values passed in the query)...

I even tried changing the below hack, but got the same results.

str name=fq_bbox{!bbox pt=$fps_latlong sfield=geo
d=$fps_dist}/str

Not sure if I am missing something...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066872.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Support for Mongolian language

2013-05-29 Thread bbarani
Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open.. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query syntax error: Cannot parse ....

2013-05-29 Thread bbarani
# has a separate meaning in URL.. You need to encode that..

http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping%20Special%20Characters.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-syntax-error-Cannot-parse-tp4066560p4066879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouping results based on the field which matched the query

2013-05-29 Thread bbarani
Not sure if you are looking for this..

http://wiki.apache.org/solr/FieldCollapsing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-results-based-on-the-field-which-matched-the-query-tp4065670p4066882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread Chris Hostetter

: lst name=defaults
...
:   lst name=appends
: str name=fq{!switch case='*:*' default=$fq_bbox
: v=$fps_latlong}/str
: /lst
: lst name=invariants
: str name=fq_bbox_query_:{!bbox pt=$fps_latlong sfield=geo
: d=$fps_dist}/str
: /lst
: /lst

...you have your appends and invariants nested inside your defaults -- 
they should be siblings...

 lst name=defaults
...
 /lst
 lst name=appends
...
 /lst
 lst name=invariants
...
 /lst

-Hoss


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread bbarani
 I totally missed that..Sorry about that :)...It seems to work fine now...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066891.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Note on The Book

2013-05-29 Thread Markus Jelsma
Jack,

I'd prefer tons of information instead of a meager 300 page book that leaves a 
lot of questions. I'm looking forward to a paperback or hardcover book and 
price doesn't really matter, it is going to be worth it anyway.

Thanks,
Markus

 
 
-Original message-
 From:Jack Krupansky j...@basetechnology.com
 Sent: Wed 29-May-2013 15:10
 To: solr-user@lucene.apache.org
 Subject: Re: Note on The Book
 
 Erick, your point is well taken. Although my primary interest/skill is to 
 produce a solid foundation reference (including tons of examples), the real 
 goal is to then build on top of that foundation.
 
 While I focus on the hard-core material - which really does include some 
 narrative and lots of examples in addition to tons of mere reference, my 
 co-author, Ryan Tabora, will focus almost exclusively on... narrative and 
 diagrams.
 
 And when I say reference, I also mean lots of examples. Even as the 
 hard-core reference stabilizes, the examples will continue to grow (like 
 weeds!).
 
 Once we get the current, existing, under-review, chapters packaged into the 
 new book and available for purchase and download (maybe Lulu, not decided) - 
 available, in a couple of weeks, it will be updated approximately every 
 other week, both with additional reference material, and additional 
 narrative and diagrams.
 
 One of our priorities (after we get through Stage 0 of the next few weeks) 
 is to in fact start giving each of the long Deep Dive Chapters enough 
 narrative lead to basically say exactly that - why you should care.
 
 A longer-term priority is to improve the balance of narrative and hard-core 
 reference. Yeah, that will be a lot of pages. It already is. We were at 907 
 pages and I was about to drop in another 166 pages on update handlers when 
 O'Reilly threw up their hands and pulled the plug. I was estimating 1200 
 pages at that stage. And I'll probably have another 60-80 pages on update 
 request processors within a week or so. With more to come. That did include 
 a lot of hard-core material and example code for Lucene, which won't be in 
 the new Solr-only book. By focusing on an e-book the raw page count alone 
 becomes moot. We haven't given up on print - the intent is eventually to 
 have multiple volumes (4-8 or so, maybe more), both as cheaper e-books ($3 
 to $5 each) and slimmer print volumes for people who don't need everything 
 in print.
 
 In fact, we will likely offer the revamped initial chapters of the book as a 
 standalone introduction to Solr - narrative introduction (why should you 
 care about Solr), basic concepts of Lucene and Solr (and why you should 
 care!), brief tutorial walkthough of the major feature areas of Solr, and a 
 case study. The intent would be both e-book and a slim print volume (75 
 pages?).
 
 Another priority (beyond Stage 0) is to develop a detailed roadmap diagram 
 of Solr and how applications can use Solr, and then use that to show how 
 each of the Deep Dive sections (heavy reference, but gradually adding more 
 narrative over time.)
 
 We will probably be very open to requests - what people really wish a book 
 would actually do for them. The only request we won't be open to is to do it 
 all in only 300 pages.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Erick Erickson
 Sent: Wednesday, May 29, 2013 7:19 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Note on The Book
 
 FWIW, picking up on Alexandre's point. One of my continual
 frustrations with virtually _all_
 technical books is they become endless pages of details without ever
 mentioning why
 the hell I should care. Unfortunately, explaining use-cases for
 everything would only make
 the book about 10,000 pages long. Siiigh.
 
 I guess you can take this as a vote for narrative
 
 Erick
 
 On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky j...@basetechnology.com 
 wrote:
  We'll have a blog for the book. We hope to have a first
  raw/rough/partial/draft published as an e-book in maybe 10 days to 2 
  weeks.
  As soon as we get that process under control, we'll start the blog. I'll
  keep your email on file and keep you posted.
 
  -- Jack Krupansky
 
  -Original Message- From: Swati Swoboda
  Sent: Tuesday, May 28, 2013 1:36 PM
  To: solr-user@lucene.apache.org
  Subject: RE: Note on The Book
 
 
  I'd definitely prefer the spiral bound as well. E-books are great and your
  draft version seems very reasonably priced (aka I would definitely get 
  it).
 
  Really looking forward to this. Is there a separate mailing list / etc. 
  for
  the book for those who would like to receive updates on the status of the
  book?
 
  Thanks
 
  Swati Swoboda
  Software Developer - Igloo Software
  +1.519.489.4120  sswob...@igloosoftware.com
 
  Bring back Cake Fridays – watch a video you’ll actually like
  http://vimeo.com/64886237
 
 
  -Original Message-
  From: Jack Krupansky [mailto:j...@basetechnology.com]
  Sent: Thursday, May 23, 2013 7:15 PM

Re: Seeming bug in ConcurrentUpdateSolrServer

2013-05-29 Thread Benson Margulies
I now understand the algorithm, but I don't understand why is the way it is.

Consider one of these objects configure with a handful of threads and
a pretty big queue.

When the first request comes in, the object creates one runner. It
then won't create a second runner until the Q reaches 1/2-full.

If the idea is that we want to pile up 'a lot' (1/2-of-a-q) of work
before sending any of it, why start that first runner?

On Wed, May 29, 2013 at 2:45 PM, Benson Margulies bimargul...@gmail.com wrote:
 Ah. So now I have to find some other explanation of why it never
 creates more than one thread, even when I make a very deep queue and
 specify 6 threads.

 On Wed, May 29, 2013 at 2:25 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 On Wed, May 29, 2013 at 11:29 PM, Benson Margulies 
 bimargul...@gmail.comwrote:

 The comment here is clearly wrong, since there is no division by two.

 I think that the code is wrong, because this results in not starting
 runners when it should start runners. Am I misanalyzing?

 if (runners.isEmpty() || (queue.remainingCapacity()  queue.size() // queue

   // is

   // half

   // full

   // and

   // we

   // can

   // add

   // more

   // runners
runners.size()  threadCount)) {



 queue.remainingCapacity() returns capacity - queue.size() so the comment is
 correct.

 --
 Regards,
 Shalin Shekhar Mangar.


RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-29 Thread Bryan Loofbourrow
Andy,

 I don't understand why it's taking 7 secs to return highlights. The size
 of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set to
 1024 for this verification purpose and that should be more than enough.
 The processor is plenty powerful enough as well.

 Running VisualVM shows all my CPU time being taken by mainly these 3
 methods:

 org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
 nfo.getStartOffset()
 org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
 nfo.getStartOffset()
 org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap(
 )

That is a strange and interesting set of things to be spending most of
your CPU time on. The implication, I think, is that the number of term
matches in the document for terms in your query (or, at least, terms
matching exact words or the beginning of phrases in your query) is
extremely high . Perhaps that's coming from this partial word match you
mention -- how does that work?

-- Bryan

 My guess is that this has something to do with how I'm handling partial
 word matches/highlighting. I have setup another request handler that
 only searches the whole word fields and it returns in 850 ms with
 highlighting.

 Any ideas?

 - Andy


 -Original Message-
 From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
 Sent: Monday, May 20, 2013 1:39 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Slow Highlighter Performance Even Using
 FastVectorHighlighter

 My guess is that the problem is those 200M documents.
 FastVectorHighlighter is fast at deciding whether a match, especially a
 phrase, appears in a document, but it still starts out by walking the
 entire list of term vectors, and ends by breaking the document into
 candidate-snippet fragments, both processes that are proportional to the
 length of the document.

 It's hard to do much about the first, but for the second you could
 choose
 to expose FastVectorHighlighter's FieldPhraseList representation, and
 return offsets to the caller rather than fragments, building up your own
 snippets from a separate store of indexed files. This would also permit
 you to set stored=false, improving your memory/core size ratio, which
 I'm guessing could use some improving. It would require some work, and
 it
 would require you to store a representation of what was indexed outside
 the Solr core, in some constant-bytes-to-character representation that
 you
 can use offsets with (e.g. UTF-16, or ASCII+entity references).

 However, you may not need to do this -- it may be that you just need
 more
 memory for your search machine. Not JVM memory, but memory that the O/S
 can use as a file cache. What do you have now? That is, how much memory
 do
 you have that is not used by the JVM or other apps, and how big is your
 Solr core?

 One way to start getting a handle on where time is being spent is to set
 up VisualVM. Turn on CPU sampling, send in a bunch of the slow highlight
 queries, and look at where the time is being spent. If it's mostly in
 methods that are just reading from disk, buy more memory. If you're on
 Linux, look at what top is telling you. If the CPU usage is low and the
 wa number is above 1% more often than not, buy more memory (I don't
 know
 why that wa number makes sense, I just know that it has been a good rule
 of thumb for us).

 -- Bryan

  -Original Message-
  From: Andy Brown [mailto:andy_br...@rhoworld.com]
  Sent: Monday, May 20, 2013 9:53 AM
  To: solr-user@lucene.apache.org
  Subject: Slow Highlighter Performance Even Using FastVectorHighlighter
 
  I'm providing a search feature in a web app that searches for
 documents
  that range in size from 1KB to 200MB of varying MIME types (PDF, DOC,
  etc). Currently there are about 3000 documents and this will continue
 to
  grow. I'm providing full word search and partial word search. For each
  document, there are three source fields that I'm interested in
 searching
  and highlighting on: name, description, and content. Since I'm
 providing
  both full and partial word search, I've created additional fields that
  get tokenized differently: name_par, description_par, and content_par.
  Those are indexed and stored as well for querying and highlighting. As
  suggested in the Solr wiki, I've got two catch all fields text and
  text_par for faster querying.
 
  An average search results page displays 25 results and I provide
 paging.
  I'm just returning the doc ID in my Solr search results and response
  times have been quite good (1 to 10 ms). The problem in performance
  occurs when I turn on highlighting. I'm already using the
  FastVectorHighlighter and depending on the query, it has taken as long
  as 15 seconds to get the highlight snippets. However, this isn't
 always
  the case. Certain query terms result in 1 sec or less response time.
 In
  any case, 15 seconds is way too long.
 
  I'm fairly new to Solr but I've spent days coming up with what 

Solr query performance tool

2013-05-29 Thread Spyros Lambrinidis
Hi,

Lately we are seeing increased latency times on solr and we would like to
know which queries / facet searches are the most time consuming and heavy
for our system.

Is there any tool equivalent to the mysql low log? Does solr keep the times
each query takes in some log?

Thank you for your help.

-S.


-- 
Spyros Lambrinidis
Head of Engineering  Commando of
PeoplePerHour.comhttp://www.peopleperhour.com
Evmolpidon 23
118 54, Gkazi
Athens, Greece
Tel: +30 210 3455480

Follow us on Facebook http://www.facebook.com/peopleperhour
Follow us on Twitter http://twitter.com/#%21/peopleperhour


Re: Problem with PatternReplaceCharFilter

2013-05-29 Thread Jack Krupansky
Just replace the stripped markup with the equivalent number of spaces to 
maintain positions.


Was there some specific problem you were encountering?

-- Jack Krupansky

-Original Message- 
From: jasimop

Sent: Wednesday, May 29, 2013 4:12 PM
To: solr-user@lucene.apache.org
Subject: Problem with PatternReplaceCharFilter

Hi,

I have a Problem when using PatternReplaceCharFilter when indexing a field.
I created the following field:
   fieldType name=testfield class=solr.TextField
 analyzer type=index
   charFilter class=solr.PatternReplaceCharFilterFactory
pattern=#60;TextDocument[^#62;]*#62; replacement=/
   charFilter class=solr.PatternReplaceCharFilterFactory
pattern=#60;/TextDocument#62; replacement=/--
   charFilter class=solr.PatternReplaceCharFilterFactory
pattern=#60;TextLine[^#60;]+ content=\#34;([^\#34;]*)\#34;[^/]+/#62;
replacement=$1 /
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt  format=snowball
enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldType

And I created a field that is indexed and stored:
field name=testfield type=testfield indexed=true stored=true /

I need to index a document with such a structure in this field:
TextDocument filename=somefile.end mime=... created=...TextLine
aa=bb cc=dd content=the content to search in ee=ff /TextLine
aa=bb cc=dd content=the second content line ee=ff //TextDocument

Basically I have some sort of XML structure, i need only to search in the
content attribute, but when highlighting i need to get back to the
enclosing XML tags.

So with the 3 Regex I want to remove all unwanted tags and tokenize/index
only the important data.
I know that I could use HTMLStripCharFilterFactory but then also the tag
names, attribute names and values get indexed. And I don't want to search in
that content too.

I read the following in the doc:
NOTE: If you produce a phrase that has different length to source string and
the field is used for highlighting for a term of the phrase, you will face a
trouble.

The thing is, why is this the case? When running the analyze from solr admin
the CharFilters generate
the content to search in the second content line which looks perfect, but
then the StandardTokenizer
gets the start and end positions of the tokens wrong. Why is this the case?
Does there exist another solution to my problem?
Could I use the following method I saw in the doc of
PatternReplaceCharFilter:
protected int correct(int currentOff) Documentation: Retrieve the corrected
offset.

How could I solve such a task?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Note on The Book

2013-05-29 Thread Jack Krupansky

Markus,

Okay, more pages it is!

-- Jack Krupansky

-Original Message- 
From: Markus Jelsma

Sent: Wednesday, May 29, 2013 5:35 PM
To: solr-user@lucene.apache.org
Subject: RE: Note on The Book

Jack,

I'd prefer tons of information instead of a meager 300 page book that leaves 
a lot of questions. I'm looking forward to a paperback or hardcover book and 
price doesn't really matter, it is going to be worth it anyway.


Thanks,
Markus



-Original message-

From:Jack Krupansky j...@basetechnology.com
Sent: Wed 29-May-2013 15:10
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

Erick, your point is well taken. Although my primary interest/skill is to
produce a solid foundation reference (including tons of examples), the 
real

goal is to then build on top of that foundation.

While I focus on the hard-core material - which really does include some
narrative and lots of examples in addition to tons of mere reference, my
co-author, Ryan Tabora, will focus almost exclusively on... narrative and
diagrams.

And when I say reference, I also mean lots of examples. Even as the
hard-core reference stabilizes, the examples will continue to grow (like
weeds!).

Once we get the current, existing, under-review, chapters packaged into 
the
new book and available for purchase and download (maybe Lulu, not 
decided) -

available, in a couple of weeks, it will be updated approximately every
other week, both with additional reference material, and additional
narrative and diagrams.

One of our priorities (after we get through Stage 0 of the next few weeks)
is to in fact start giving each of the long Deep Dive Chapters enough
narrative lead to basically say exactly that - why you should care.

A longer-term priority is to improve the balance of narrative and 
hard-core
reference. Yeah, that will be a lot of pages. It already is. We were at 
907

pages and I was about to drop in another 166 pages on update handlers when
O'Reilly threw up their hands and pulled the plug. I was estimating 1200
pages at that stage. And I'll probably have another 60-80 pages on update
request processors within a week or so. With more to come. That did 
include

a lot of hard-core material and example code for Lucene, which won't be in
the new Solr-only book. By focusing on an e-book the raw page count alone
becomes moot. We haven't given up on print - the intent is eventually to
have multiple volumes (4-8 or so, maybe more), both as cheaper e-books ($3
to $5 each) and slimmer print volumes for people who don't need everything
in print.

In fact, we will likely offer the revamped initial chapters of the book as 
a

standalone introduction to Solr - narrative introduction (why should you
care about Solr), basic concepts of Lucene and Solr (and why you should
care!), brief tutorial walkthough of the major feature areas of Solr, and 
a

case study. The intent would be both e-book and a slim print volume (75
pages?).

Another priority (beyond Stage 0) is to develop a detailed roadmap diagram
of Solr and how applications can use Solr, and then use that to show how
each of the Deep Dive sections (heavy reference, but gradually adding more
narrative over time.)

We will probably be very open to requests - what people really wish a book
would actually do for them. The only request we won't be open to is to do 
it

all in only 300 pages.

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, May 29, 2013 7:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

FWIW, picking up on Alexandre's point. One of my continual
frustrations with virtually _all_
technical books is they become endless pages of details without ever
mentioning why
the hell I should care. Unfortunately, explaining use-cases for
everything would only make
the book about 10,000 pages long. Siiigh.

I guess you can take this as a vote for narrative

Erick

On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky j...@basetechnology.com
wrote:
 We'll have a blog for the book. We hope to have a first
 raw/rough/partial/draft published as an e-book in maybe 10 days to 2
 weeks.
 As soon as we get that process under control, we'll start the blog. I'll
 keep your email on file and keep you posted.

 -- Jack Krupansky

 -Original Message- From: Swati Swoboda
 Sent: Tuesday, May 28, 2013 1:36 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Note on The Book


 I'd definitely prefer the spiral bound as well. E-books are great and 
 your

 draft version seems very reasonably priced (aka I would definitely get
 it).

 Really looking forward to this. Is there a separate mailing list / etc.
 for
 the book for those who would like to receive updates on the status of 
 the

 book?

 Thanks

 Swati Swoboda
 Software Developer - Igloo Software
 +1.519.489.4120  sswob...@igloosoftware.com

 Bring back Cake Fridays – watch a video you’ll actually like
 http://vimeo.com/64886237


 -Original Message-
 From: Jack 

java.lang.IllegalAccessError when invoking protected method from another class in the same package path but different jar.

2013-05-29 Thread bbarani
Hi,

I am overriding the query component and creating a custom component. I am
using _responseDocs from org.apache.solr.handler.component.ResponseBuilder
to get the values. I have my component in same package
(org.apache.solr.handler.component) to access the _responseDocs value.

Everything works fine when I run the test for this component but I am
getting the below error when I package the custom component in a jar and
place it in lib directory (inside solr/lib - using basic jetty
configuration).

I assume this is due to the fact that different class loaders load different
class at runtime. Is there a way to resolve this?

str name=msgjava.lang.IllegalAccessError: tried to access field
org.apache.solr.handler.component.ResponseBuilder._responseDocs from class
org.apache.solr.handler.component.WPFastDistributedQueryComponent/strstr
name=tracejava.lang.RuntimeException: java.lang.IllegalAccessError: tried
to access field
org.apache.solr.handler.component.ResponseBuilder._responseDocs from class
org.apache.solr.handler.component.CustomComponent
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalAccessError: tried to access field
org.apache.solr.handler.component.ResponseBuilder._responseDocs from class
org.apache.solr.handler.component.WPFastDistributedQueryComponent
at
org.apache.solr.handler.component.WPFastDistributedQueryComponent.handleResponses(WPFastDistributedQueryComponent.java:131)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
... 26 more




--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-IllegalAccessError-when-invoking-protected-method-from-another-class-in-the-same-package-p-tp4066904.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Support for Mongolian language

2013-05-29 Thread Upayavira


On Wed, May 29, 2013, at 09:34 PM, bbarani wrote:
 Check out..
 
 wiki.apache.org/solr/LanguageAnalysis‎
 
 For some reason the above site takes long time to open.. 

There's a known performance issue with the wiki. Admins are working on
it.

Upayavira


Re: java.lang.IllegalAccessError when invoking protected method from another class in the same package path but different jar.

2013-05-29 Thread bbarani
My assumptions were right :)

I was able to fix this error by copying all my custom jar inside
webapp/web-inf/lib directory and everything started working 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-IllegalAccessError-when-invoking-protected-method-from-another-class-in-the-same-package-p-tp4066904p4066906.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr 4.3: write.lock is not removed

2013-05-29 Thread Zhang, Lisheng
Hi,
 
I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that 
after finishing
indexing 
 
write.lock
 
is NOT removed. Later if I index again it still works OK. Only after I shutdown 
Tomcat 
then write.lock is removed. This behavior caused some problem like I could not 
use luke
to observe indexed data.
 
I did not see any error/warning messages.
 
Is this the designed behavior? Can I have the old behavior (after commit 
write.lock is
removed) through configuration?
 
Thanks very much for helps, Lisheng


Re: Solr query performance tool

2013-05-29 Thread Otis Gospodnetic
Hi,

The regular Solr log logs Qtime for each query.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 29, 2013 5:59 PM, Spyros Lambrinidis spy...@peopleperhour.com
wrote:

 Hi,

 Lately we are seeing increased latency times on solr and we would like to
 know which queries / facet searches are the most time consuming and heavy
 for our system.

 Is there any tool equivalent to the mysql low log? Does solr keep the times
 each query takes in some log?

 Thank you for your help.

 -S.


 --
 Spyros Lambrinidis
 Head of Engineering  Commando of
 PeoplePerHour.comhttp://www.peopleperhour.com
 Evmolpidon 23
 118 54, Gkazi
 Athens, Greece
 Tel: +30 210 3455480

 Follow us on Facebook http://www.facebook.com/peopleperhour
 Follow us on Twitter http://twitter.com/#%21/peopleperhour



Re: Solr query performance tool

2013-05-29 Thread Erick Erickson
The qtimes are in the solr log, you'll see lines like:
params={q=*:*} hits=32 status=0 QTime=5

QTime is the time spent serving the query but does NOT include
assembling the response.

Best
Erick

On Wed, May 29, 2013 at 5:58 PM, Spyros Lambrinidis
spy...@peopleperhour.com wrote:
 Hi,

 Lately we are seeing increased latency times on solr and we would like to
 know which queries / facet searches are the most time consuming and heavy
 for our system.

 Is there any tool equivalent to the mysql low log? Does solr keep the times
 each query takes in some log?

 Thank you for your help.

 -S.


 --
 Spyros Lambrinidis
 Head of Engineering  Commando of
 PeoplePerHour.comhttp://www.peopleperhour.com
 Evmolpidon 23
 118 54, Gkazi
 Athens, Greece
 Tel: +30 210 3455480

 Follow us on Facebook http://www.facebook.com/peopleperhour
 Follow us on Twitter http://twitter.com/#%21/peopleperhour


Re: java.lang.IllegalAccessError when invoking protected method from another class in the same package path but different jar.

2013-05-29 Thread Chris Hostetter

: Subject: java.lang.IllegalAccessError when invoking protected method from
: another class in the same package path but different jar.
...
: I am overriding the query component and creating a custom component. I am
: using _responseDocs from org.apache.solr.handler.component.ResponseBuilder
: to get the values. I have my component in same package

_responseDocs is not protected it is package-private which is why you 
can't access it from a subclass in another *runtime* pacakge.  Even if 
you put your custom component in the same org.apache.solr... package 
namespace, the runtime package is determined by the ClassLoader combined 
with the source package...

http://www.cooljeff.co.uk/2009/05/03/the-subtleties-of-overriding-package-private-methods/

...this is helpful to ensure plugins don't attempt to do tihngs they 
shouldn't.

In general, the ResponseBuilder class internals aren't very friendly in 
terms of allowing custom components to interact with the intermediate 
results of other built in components -- it's primarily designed arround 
letting other internal Solr components share data with eachother in 
(hopefully) well tested ways.  Note that there is even a specific comment 
one line directly above the declaration of _responseDocs that alludes to 
it and several other variables being deliberately package-private...

  /* private... components that don't own these shouldn't use them */
  SolrDocumentList _responseDocs;
  StatsInfo _statsInfo;
  TermsComponent.TermsHelper _termsHelper;
  SimpleOrderedMapListNamedListObject _pivots;

If you want access to the SolrDocumentList containing the query results, 
the only safe way/time to do that is by fetching it out of the response 
(ResponseBuilder.rsp) after the QueryComponent has put it there in it's 
finishStage -- untill then ResponseBuilder._responseDocs may not be 
correct (ie: distribute search, grouped search, etc...)

-Hoss


multiple field join?

2013-05-29 Thread cmd.ares
http://wiki.apache.org/solr/Join  
I found solr join is actually  sql subquery,does solr  support 3 tables jion
? the sql like this
SELECT xxx, yyy 
FROM collection1
WHERE 
outer_id IN (SELECT inner_id FROM collection1 where zzz = vvv)
and 
outer_id2 IN (SELECT inner_id2 FROM collection1 where ttt = xxx)
and 
outer_id3 IN (SELECT inner_id3 FROM collection1 where ppp = rrr)

how to write the solr request url?
thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-field-join-tp4066930.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with PatternReplaceCharFilter

2013-05-29 Thread jasimop
Honestly, I have no idea how to do that.
PatternReplaceCharFilter doesn't seem to have a parameter like
preservePositions=true and
optionally fillCharacter= .
And I don't think I can express this simply as regex. How would I count in a
pure
regex the length difference before and after the match?

Well, the specific problem is, that when highlighting the term positions are
wrong and the
result is not a valid XML structure that I can handle.
I expect something like
TextLine aa=quot;bbquot; cc=quot;ddquot; content=quot;the content to
lt;emsearch/em in ee=ff /
but I can 
Texlt;emtLine/emaa=bb cc=dd content=the content to emsearch/em
in ee=ff /

Thanks for your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869p4066939.html
Sent from the Solr - User mailing list archive at Nabble.com.