date:20130529

Sorting results by last update date

2013-05-29 Thread Kamal Palei

Hi All
I am trying to sort the results as per last updated date. My url looks as
below.

*fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
588]fq=salary:[0 TO 500] OR
salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java
+sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java
+sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
json.nl=mapsort=last_updated_date asc
*
With this I get the data in ascending order of last updated date.

If I am trying to sort data in descending order, I use below url

*fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
588]fq=salary:[0 TO 500] OR
salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java
+sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java
+sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
json.nl=mapsort=last_updated_date desc*

Here the data set is not ordered properly, mostly it looks to me data is
ordered on basis of score, not last updated date.

Can somebody tell me what I am missing here, why *desc* is not working
properly for me.

Thanks
kamal

Re: What exactly happens to extant documents when the schema changes?

2013-05-29 Thread Dotan Cohen

On Tue, May 28, 2013 at 2:20 PM, Upayavira u...@odoko.co.uk wrote:
 The schema provides Solr with a description of what it will find in the
 Lucene indexes. If you, for example, changed a string field to an
 integer in your schema, that'd mess things up bigtime. I recently had to
 upgrade a date field from the 1.4.1 date field format to the newer
 TrieDateField. Given I had to do it on a live index, I had to add a new
 field (just using copyfield) and re-index over the top, as the old field
 was still in use. I guess, given my app now uses the new date field
 only, I could presumably reindex the old date field with the new
 TrieDateField format, but I'd want to try that before I do it for real.


Thank you for the insight. Unfortunately, with 20 million records and
growing by hundreds each minute (social media posts) I don't see that
I could ever reindex the data in a timely way.


 However, if you changed a single valued field to a multi-valued one,
 that's not an issue, as a field with a single value is still valid for a
 multi-valued field.

 Also, if you add a new field, existing documents will be considered to
 have no value in that field. If that is acceptable, then you're fine.

 I guess if you remove a field, then those fields will be ignored by
 Solr, and thus not impact anything. But I have to say, I've never tried
 that.

 Thus - changing the schema will only impact on future indexing. Whether
 your existing index will still be valid depends upon the changes you are
 making.

 Upayavira

Thanks.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: What exactly happens to extant documents when the schema changes?

2013-05-29 Thread Dotan Cohen

On Tue, May 28, 2013 at 3:58 PM, Jack Krupansky j...@basetechnology.com wrote:
 The technical answer: Undefined and not guaranteed.


I was afraid of that!

 Sure, you can experiment and see what the effects happen to be in any
 given release, and maybe they don't tend to change (too much) between most
 releases, but there is no guarantee that any given change schema but keep
 existing data without a delete of directory contents and full reindex will
 actually be benign or what you expect.

 As a general proposition, when it comes to changing the schema and not
 deleting the directory and doing a full reindex, don't do it! Of course, we
 all know not to try to walk on thin ice, but a lot of people will try to do
 it anyway - and maybe it happens that most of the time the results are
 benign.


In the case of this particular application, reindexing really is
overly burdensome as the application is performing hundreds of writes
to the index per minute. How might I gauge how much spare I/O Solr
could commit to a reindex? All the data that I need is in fact in
stored fields.

Note that because the social media application that feeds our Solr
index is global, there are no 'off hours'.


 OTOH, you could file a Jira to propose that the effects of changing the
 schema but keeping the existing data should be precisely defined and
 documented, but, that could still change from release to release.


Seems like a lot of effort to document, for little benefit. I'm not
going to file it. I would like to know, though, is the schema
consulted at index time, query time, or both?


 From a practical perspective for your original question: If you suddenly add
 a field, there is no guarantee what will happen when you try to access that
 field for existing documents, or what will happen if you update existing
 documents. Sure, people can talk about what happens to be true today, but
 there is no guarantee for the future. Similarly for deleting a field from
 the schema, there is no guarantee about the status of existing data, even
 though people can chatter about what it seems to do today.

 Generally, you should design your application around contracts and what is
 guaranteed to be true, not what happens to be true from experiments or even
 experience. Granted, that is the theory and sometimes you do need to rely on
 experimentation and folklore and spotty or ambiguous documentation, but to
 the extent possible, it is best to avoid explicitly trying to rely on
 undocumented, uncontracted behavior.


Thanks. The application does change (added features) and we do not
want to loose old data.


 One question I asked long ago and never received an answer: what is the best
 practice for doing a full reindex - is it sufficient to first do a delete of
 *:*, or does the Solr index directory contents or even the directory
 itself need to be explicitly deleted first? I believe it is the latter, but
 the former seems to work, most of the time. Deleting the directory itself
 seems to be the best answer, to date - but no guarantees!


I don't have an answer for that, sorry!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Choosing specific fields for suggestions in SpellCheckerComponent

2013-05-29 Thread Shalin Shekhar Mangar

Hi Wilson,

I don't think SpellCheckComponent supports multiple fields in the same
dictionary. Am I missing something?


On Wed, May 29, 2013 at 10:24 AM, Wilson Passos wrpas...@gmail.com wrote:

 Hi everyone,


 I've been searching about how to configure the SpellCheckerComponent in
 Solr 4.0 to support suggestion queries based on s subset of the configured
 fields in schema.xml. Let's say the spell checking is configured to use
 these 4 fields:

 field name=field1 type=text_general/
 field name=field2 type=text_general/
 field name=field3 type=text_general/
 field name=field4 type=text_general/

 I'd like to know if there's any possibility to dynamically set the
 SpellCheckerComponent to suggest terms using just fields field2 and
 field3 instead of the default behavior, which always includes suggestions
 across the 4 defined fields.

 Thanks in advance for any help!




-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-29 Thread Shalin Shekhar Mangar

I have opened https://issues.apache.org/jira/browse/SOLR-4870


On Tue, May 28, 2013 at 5:53 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 This sounds like a bug. I'll open an issue. Thanks!


 On Tue, May 28, 2013 at 2:29 PM, AlexeyK lex.kudi...@gmail.com wrote:

 The cluster state problem reported above is not an issue - it was caused
 by
 our own code.
 Speaking about the update log - i have noticed a strange behavior
 concerning
 the replay. The replay is *supposed* to be done for a predefined number of
 log entries, but actually it is always done for the whole last 2 tlogs.
 RecentUpdates.update() reads log within  while (numUpdates 
 numRecordsToKeep), while numUpdates is never incremented, so it exits when
 the reader reaches EOF.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4066452.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-29 Thread Shalin Shekhar Mangar

On Thu, May 23, 2013 at 7:00 PM, AlexeyK lex.kudi...@gmail.com wrote:

snip /

from what I understood from the code, for each 'add' command there is a
test
for a 'delete by query'. if there is an older dbq, it's run after the 'add'
operation if its version 'add' version.
in my case, there are a lot of documents to be inserted, and a single large
DBQ. My question is: shouldn't this be done in bulks? Why is it necessary
to
run the DBQ after each insertion? Supposedly there are 1000 insertions it's
run 1000 times.

As I understand it, this is done to handle out-of-order updates. Suppose a
client makes a few add requests and then invokes a DBQ but the DBQ reaches
the replicas before the last add request. In such a case, the DBQ is
executed after the add request to preserve consistency. We don't do that in
bulk because we don't know how long to wait for all add requests to arrive.
Also, the individual add requests may arrive via different threads (think
connection reset from leader to replica).

That being said, the scenario you describe of a 1000 insertions causing
DBQs to be run a large number of times (on recovery after restarting) could
be optimized. Note that the bug you discovered (SOLR-4870) does not affect
log replay because log replay on startup will replay all of the last two
transaction logs (unless they end with a commit). Only PeerSync is affected
by SOLR-4870.

You say that both nodes are leaders but the comment inside
DirectUpdateHandler2.addDoc() says that deletesAfter (i.e. reordered DBQs)
should always be null on leaders. So there's definitely something fishy
here. A quick review of the code leads me to believe that reordered DBQs
can happen on a leader as well. I'll investigate further.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4065628.html
Sent from the Solr - User mailing list archive at Nabble.com.

77 matches

Mail list logo