[jira] Commented: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512372 ] Pieter Berkel commented on SOLR-258: I've just tried this patch and the results are impressive! I agree with Ryan regarding the naming of 'pre', 'post' and 'inner', using simple concrete words will make it easier for developers to understand the basic concepts. At first I was a little confused how the 'gap' parameter was used, perhaps a name like 'interval' would be more indicative of it's purpose? While on the topic of gaps / intervals, I can imagine a case where one might want facet counts over non-linear intervals, for instance obtaining results from: Last 7 days, Last 30 days, Last 90 days, Last 6 months. Obviously you can achieve this by setting facet.date.gap=+1DAY and then post-process the results, but a much more elegant solution would be to allow facet.date.gap (or another suitably named param) to accept a (comma-delimited) set of explicit partition dates: facet.date.start=NOW-6MONTHS/DAY facet.date.end=NOW/DAY facet.date.gap=NOW-90DAYS/DAY,NOW-30DAYS/DAY,NOW-7DAYS/DAY It would then be trivial to calculate facet counts for the ranges specified above. It would be useful to make the 'start' an 'end' parameters optional. If not specified 'start' should default to the earliest stored date value, and 'end' should default to the latest stored date value (assuming that's possible). Probably should return a 400 if 'gap' is not set. My personal opinion is that 'end' should be a hard limit, the last gap should never go past 'end'. Given that the facet label is always generated from the lower value in the range, I don't think truncating the last 'gap' will cause problems, however it may be helpful to return the actual date value for end if it was specified as a offset of NOW. What might be a problem is when both start and end dates are specified as offsets of NOW, the value of NOW may not be constant for both values. In one of my tests, I set: facet.date.start=NOW-12MONTHS facet.date.end=NOW facet.date.gap=+1MONTH With some extra debugging output I can see that mostly the value of NOW is the same: str name=start2006-07-13T06:06:07.397/str str name=end2007-07-13T06:06:07.397/str However occasionally there is a difference: str name=start2006-07-13T05:48:23.014/str str name=end2007-07-13T05:48:23.015/str This difference alters the number of gaps calculated (+1 when NOW values are diff for start end). Not sure how this could be fixed, but as you mentioned above, it will probably involve changing ft.toExternal(ft.toInternal(...)). Thanks again for creating this useful addition, I'll try to test it a bit more and see if I can find anything else. Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (SOLR-298) NGramTokenFilter missing in trunk
[ https://issues.apache.org/jira/browse/SOLR-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Peuss reopened SOLR-298: --- Sorry. I have not really stated that this issue is for Solr. In Solr-trunk I don' find the ngram filters: [EMAIL PROTECTED] /cygdrive/c/Projects/solr-trunk2 $ grep -ril ngramfilter * [EMAIL PROTECTED] /cygdrive/c/Projects/solr-trunk2 $ This was a fresh checkout. NGramTokenFilter missing in trunk - Key: SOLR-298 URL: https://issues.apache.org/jira/browse/SOLR-298 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor In one of the patches for SOLR-81 are Ngram TokenFilters. Only the Tokenizers seem to have made it into Subversion (trunk). What happened to them? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
nightly builds / solrj-lib
I just took a look at the files contained in: http://people.apache.org/builds/lucene/solr/nightly/ the dist directory does not include the .jar files needed for solrj. Can we modify the script to include 'solrj-lib'? ryan
[jira] Updated: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-139: --- Attachment: SOLR-269+139-ModifiableDocumentUpdateProcessor.patch implements modifiable documents in the SOLR-269 update processor chain. If the request does not have a 'mode' string, the ModifyDocumentProcessorFactory does not add a processor to the chain. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
comments? Hooray, and very cool. I didn't know you only needed a locking mechanism if you only have multiple index writers so the use of NoLock by default makes perfect sense. A quick stability update: Since I first submitted the patch ~2 months ago we've had 0 lockups with it running in all our test environments. - will
[jira] Resolved: (SOLR-298) NGramTokenFilter missing in trunk
[ https://issues.apache.org/jira/browse/SOLR-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-298. --- Resolution: Fixed Thomas - not everything that was in SOLR-81 earlier was committed to Solr. Some was committed to Lucene in LUCENE-759. NGramTokenFilter missing in trunk - Key: SOLR-298 URL: https://issues.apache.org/jira/browse/SOLR-298 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor In one of the patches for SOLR-81 are Ngram TokenFilters. Only the Tokenizers seem to have made it into Subversion (trunk). What happened to them? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
On 7/13/07, Will Johnson [EMAIL PROTECTED] wrote: Hooray, and very cool. I didn't know you only needed a locking mechanism if you only have multiple index writers so the use of NoLock by default makes perfect sense. For Lucene, you do (did.. before lockless commits pach) need locking (a read lock) even to open an index with a reader. The write lock is still also needed to avoid a reader changing the index via deletion at the same time a writer is. Solr coordinates this at a higher level, hence it's not really needed. -Yonik
Re: Rich Docs Indexing
On Jul 13, 2007, at 10:31 AM, Eric Pugh wrote: I wanted to see if I could get some momentum going on seeing if this is something that the committers want in Solr 1.3... I'd like to write up a wiki page similar to http://wiki.apache.org/solr/ UpdateCSV page that would give folks a chance to see what this code can do, but highlight that it is a wiki page about just a patch file? Would this be okay, or misleading to folks? Eric - kudos! Thanks for this contribution and effort to document it. There is already precedent here - the Field Collapsing contribution has worked thus far too: http://wiki.apache.org/solr/FieldCollapsing So go for it! Erik, who will one day look at this contribution, but not for a few weeks, sorry
[jira] Commented: (SOLR-269) UpdateRequestProcessorFactory - process requests before submitting them
[ https://issues.apache.org/jira/browse/SOLR-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512502 ] Yonik Seeley commented on SOLR-269: --- How do you all feel about the basic structure? It's a go! It will get more complicated, I think, with document modification (SOLR-139) While it would be nice to keep the base stuff package protected, I'm more concerned with the other parts of the API that this moves front-and-center... mainly UpdateCommand and friends... those were really quick hacks on my part since there were no custom update handlers at the time. One clever change is to have the LogUpdateProcessorFactory skip building a LogUpdateProcessor if the log level is not INFO rather then keep a flag. Nice! I also need SOLR-139 btw, is it easy for you to commit this first to limit the size and scope of that patch? UpdateRequestProcessorFactory - process requests before submitting them --- Key: SOLR-269 URL: https://issues.apache.org/jira/browse/SOLR-269 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: SOLR-269-UpdateRequestProcessorFactory.patch, SOLR-269-UpdateRequestProcessorFactory.patch, SOLR-269-UpdateRequestProcessorFactory.patch, UpdateProcessor.patch A simple UpdateRequestProcessor was added to a bloated SOLR-133 commit. An UpdateRequestProcessor lets clients plug in logic after a document has been parsed and before it has been 'updated' with the index. This is a good place to add custom logic for: * transforming the document fields * fine grained authorization (can user X updated document Y?) * allow update, but not delete (by query?) requestHandler name=/update class=solr.StaxUpdateRequestHandler str name=update.processor.classorg.apache.solr.handler.UpdateRequestProcessor/str lst name=update.processor.args ... (optionally pass in arguments to the factory init method) ... /lst /requestHandler http://www.nabble.com/Re%3A-svn-commit%3A-r547495---in--lucene-solr-trunk%3A-example-solr-conf-solrconfig.xml-src-java-org-apache-solr-handler-StaxUpdateRequestHandler.java-src-java-org-apache-solr-handler-UpdateRequestProcessor.jav-tf3950072.html#a11206583 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Rich Docs Indexing
Hi all, I've been working with the RichDocumentRequestHandler (http:// issues.apache.org/jira/browse/SOLR-284) for the past weeks, and it seems to be working quite well. We discovered that when we throw a 27 MB PDF document at it we needed to beef up the Java Heap size, and we haven't come up with a great solution for handling PDF documents that have a password on them, beyond not indexing them. I wanted to see if I could get some momentum going on seeing if this is something that the committers want in Solr 1.3... I'd like to write up a wiki page similar to http://wiki.apache.org/solr/UpdateCSV page that would give folks a chance to see what this code can do, but highlight that it is a wiki page about just a patch file? Would this be okay, or misleading to folks? I've updated the patch to revision 555996. Thanks for your consideration! PS, is anyone going to be at OSCON in two weeks? I'd love to meet up with some other Solr folks. Eric --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
[jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
[ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-240: -- Attachment: IndexWriter2.patch good point about recommending 'single' in the event of concurrency bugs. i've never really looked at the internals of the LockFactories so i'm going to punt on the subclass idea for now (i like it i just don't have time to do it) but we can always redefine single later. (i'll open another bug if we're okay with committing this new patch as is) revised patch just changes the wording and suggested value in solrconfig.xml objections? java.io.IOException: Lock obtain timed out: SimpleFSLock Key: SOLR-240 URL: https://issues.apache.org/jira/browse/SOLR-240 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.2 Environment: windows xp Reporter: Will Johnson Attachments: IndexWriter.patch, IndexWriter2.patch, IndexWriter2.patch, IndexWriter2.patch, stacktrace.txt, ThrashIndex.java when running the soon to be attached sample application against solr it will eventually die. this same error has happened on both windows and rh4 linux. the app is just submitting docs with an id in batches of 10, performing a commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512558 ] Ryan McKinley commented on SOLR-139: the udpate handler knows much more about the index than we do outside Yes. The patch i just attached only deals with documents that are already commited. It uses req.getSearcher() to find existing documents. Beyond finding commited or non-commited Documents, is there anything else that it can do better? Is it enought to add something to UpdateHandler to ask for a pending or commited document by uniqueId? I like having the the actual document manipulation happening in the Processor because it is an easy place to put in other things like grabbing stuff from a SQL database. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512553 ] Yonik Seeley commented on SOLR-139: --- Some general issues w/ update processors and modifiable documents, and keeping this stuff out of the update handler is that the udpate handler knows much more about the index than we do outside, and it constrains implementation (and performance optimizations). For example, if modifiable documents were implemented in the update handler, and the old version of the document hasn't been committed yet, the update handler could buffer the complete modify command to be done at a later time (the *much* slower alternative is closing the writer and opening the reader to get the latest stored fields), then closing the reader and re-opening the writer. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
[ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512554 ] Yonik Seeley commented on SOLR-240: --- No objections... a hang (in the event of bugs) will suffice for now. java.io.IOException: Lock obtain timed out: SimpleFSLock Key: SOLR-240 URL: https://issues.apache.org/jira/browse/SOLR-240 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.2 Environment: windows xp Reporter: Will Johnson Attachments: IndexWriter.patch, IndexWriter2.patch, IndexWriter2.patch, IndexWriter2.patch, stacktrace.txt, ThrashIndex.java when running the soon to be attached sample application against solr it will eventually die. this same error has happened on both windows and rh4 linux. the app is just submitting docs with an id in batches of 10, performing a commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-139: --- Attachment: SOLR-139-ModifyInputDocuments.patch Updated patch to work with SOLR-269 UpdateRequestProcessors. One thing I think is weird about this is that it uses parameters to say the mode rather then the add command. That is, to modify a documetn you have to send: /update?mode=OVERWRITE,count:INCREMENT add doc field name=id1/field field name=count5/field /doc /add rather then: add mode=OVERWRITE,count:INCREMENT doc field name=id1/field field name=count5/field /doc /add This is fine, but it makes it hard to have an example 'modify' xml document. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512559 ] Hoss Man commented on SOLR-258: --- 1) i'm happy to break out the FacetParams into their own interface ... but i'd like to track that in a separate refactoring commit (since the existing facet params are already in SolrParams) 2) i clearly anticipated the FacetDateOther.get( bogus ) problem .. but for some reason i thought it returned null ... i'll fix that. 3) i actually considered before, between, and after originally but decided they were too long (i was trying to find a way to make start shorter as well ... but two people thinking there better convinces me. 4) my hesitation about renaming gap to interval is that i wanted to leave the door open for a sperate interval option (to define a gap between the gaps so to speak) later should it be desired ... see the questions i listed when opening the bug. 5) i don't think this code makes sense for non-linear intervals ... the problem i'm really trying to solve here is using 3 params to express equal date divisions across an arbitrarily long time scale. for the example you listed simple facet.query options probably make more sense (allthough you do have me now thinking that a another good faceting option would be some new facet.range where many values can be specified, they all get sorted and then ranges are built between each successive value ... bt that should be a seperate issue) 6) i want to make start and end optional, but for now i can't think of a clean/fast way to do end ... and we can always add defaults later. 7) my prefrence is for every count to cover a range of exactly gap but i can definitely see where having a hard cutoff of end is usefull, so i'll make it an option ... name suggestions? i'll make sure to echo the value of end as well so it's easy to build filter queries for that last range ... probably should have it anyway to build filter queries on between and after. should the ranges used to compute the between and after counts depend on where the last range ended or on the literal end param? 8) the NOW variance really bugs me ... back when i built DateMathParser i anticipated this by making the parser have a fixed concept of NOW which could be used to parse multiple strings but i don't kow why i didn't consider it when working on this new patch. the real problem is that right now DateField is relied on to do all hte parsing, and a single instance can't have a fixed notion of NOW ... it builds a new DateMathParser each time ... i think i'm going ot have to do some heavily refactoring to fix this, which is annoying -- but i don't want to commit without fixing this, even if it takes a while any bug that can produce an off by 1 millisecond discrepancy should die a horrible horrible freaking death. Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512571 ] Ryan McKinley commented on SOLR-258: but i'd like to track that in a separate refactoring commit (since the existing facet params are already in SolrParams) sounds good. ... originally but decided they were too long .. In general, I favor longer self explanatory param names over short ones.It is kind of annoying to have to look up 'pf', 'bq' to decode what it means. - - - Again, this is really great. Now we can build the ubiquitous calendar widget from solr. Thanks! Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Reopened: (SOLR-298) NGramTokenFilter missing in trunk
On 13-Jul-07, at 12:48 AM, Thomas Peuss (JIRA) wrote: [ https://issues.apache.org/jira/browse/SOLR-298? page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Peuss reopened SOLR-298: --- Sorry. I have not really stated that this issue is for Solr. In Solr-trunk I don' find the ngram filters: [EMAIL PROTECTED] /cygdrive/c/Projects/solr-trunk2 $ grep -ril ngramfilter * [EMAIL PROTECTED] /cygdrive/c/Projects/solr-trunk2 $ This was a fresh checkout. Solr includes these analyzers as a lucene jar, not source. -Mike
[jira] Resolved: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
[ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-240. --- Resolution: Fixed Fix Version/s: 1.3 Assignee: Hoss Man Committed revision 556099. java.io.IOException: Lock obtain timed out: SimpleFSLock Key: SOLR-240 URL: https://issues.apache.org/jira/browse/SOLR-240 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.2 Environment: windows xp Reporter: Will Johnson Assignee: Hoss Man Fix For: 1.3 Attachments: IndexWriter.patch, IndexWriter2.patch, IndexWriter2.patch, IndexWriter2.patch, stacktrace.txt, ThrashIndex.java when running the soon to be attached sample application against solr it will eventually die. this same error has happened on both windows and rh4 linux. the app is just submitting docs with an id in batches of 10, performing a commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-300) create subclass of SingleInstanceLockFactory which warns loadly in the event of concurrent lock attempts
[ https://issues.apache.org/jira/browse/SOLR-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-300: -- Component/s: update Priority: Minor (was: Major) Issue Type: Wish (was: Improvement) create subclass of SingleInstanceLockFactory which warns loadly in the event of concurrent lock attempts Key: SOLR-300 URL: https://issues.apache.org/jira/browse/SOLR-300 Project: Solr Issue Type: Wish Components: update Reporter: Hoss Man Priority: Minor as noted by yonik in SOLR-240... How about SingleInstanceLockFactory to aid in catching concurrency bugs? ... or even better, a subclass or other implementation: SingleInstanceWarnLockFactory or SingleInstanceCoordinatedLockFactory that log a failure if obtain() is called for a lock that is already locked. we should create a new subclass like Yonik describes and change SolrIndexWriter to use this subclass if/when single is specified as the lockType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-287) set commitMaxTime when adding a document
[ https://issues.apache.org/jira/browse/SOLR-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-287: --- Attachment: SOLR-287-AddCommitMaxTime.patch No real changes - updated to work with trunk. Without objection, I think this should be added soon... set commitMaxTime when adding a document Key: SOLR-287 URL: https://issues.apache.org/jira/browse/SOLR-287 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-287-AddCommitMaxTime.patch, SOLR-287-AddCommitMaxTime.patch Rather then setting a global autoCommit maxTime, it would be nice to set a maximum time for a single add command. This patch adds: add commitMaxTime=1000 ... /add to add the document within 1 sec. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-248) Capitalization Filter Factory
[ https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-248: --- Attachment: SOLR-248-CapitalizationFilter.patch 1. Added better javadocs explaining the configuration. 2. removed synchronized map 3. put the Filter as a package private class in the Factory file -- since the filter relies on hte factory, it is not particularly useful outsid solr. I would like to add this soon Capitalization Filter Factory - Key: SOLR-248 URL: https://issues.apache.org/jira/browse/SOLR-248 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-248-CapitalizationFilter.patch, SOLR-248-CapitalizationFilter.patch, SOLR-248-CapitalizationFilter.patch For tokens that are used in faceting, it is nice to have standard capitalization. I want Aerial views and Aerial Views to both be: Aerial Views -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-248) Capitalization Filter Factory
[ https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley reassigned SOLR-248: -- Assignee: Ryan McKinley Capitalization Filter Factory - Key: SOLR-248 URL: https://issues.apache.org/jira/browse/SOLR-248 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: SOLR-248-CapitalizationFilter.patch, SOLR-248-CapitalizationFilter.patch, SOLR-248-CapitalizationFilter.patch For tokens that are used in faceting, it is nice to have standard capitalization. I want Aerial views and Aerial Views to both be: Aerial Views -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512613 ] Ryan McKinley commented on SOLR-139: The update handler could call the processor when it was time to do the manipulation too. What are you thinking? Adding the processor as a parameter to AddUpdateCommand? ... ParallelReader, where some fields are in one sub-index ... the processor would ask the updateHandler for the existing document - the updateHandler deals with getting it to/from the right place. we could add something like: Document getDocumentFromPendingOrCommited( String indexId ) to UpdateHandler and then that is taken care of. Other then extracting the old document, what needs to be done that cant be done in the processor? Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512617 ] Yonik Seeley commented on SOLR-139: --- ... ParallelReader, where some fields are in one sub-index ... the processor would ask the updateHandler for the existing document - the updateHandler deals with getting it to/from the right place. The big reason you would use ParallelReader is to avoid touching the less-modified/bigger fields in one index when changing some of the other fields in the other index. What are you thinking? Adding the processor as a parameter to AddUpdateCommand? I didn't have a clear alternative... I was just pointing out the future pitfalls of assuming too much implementation knowledge. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512626 ] Hoss Man commented on SOLR-258: --- the big problem being that I doubt the SolrQueryRequest is always available everywhere it's needed. ...exactly, at the moment all of the date parsing is done inside DateField. i think i'll try refactoring it so that DateMathParser does *all* the parsing, and make DateField delegate to it in the non-trivial case. the problem that's still a pain to solve is getting all concepts of NOW to be the samefor a request ... things like an fq=f:[NOW * NOW+1DAY] are handled by DateField via a query parser ... i can't think of easy way to make that consistent with the facet parsing definition of NOW (without resorting to a ThreadLocal) Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512628 ] Ryan McKinley commented on SOLR-139: ... avoid touching the less-modified/bigger fields ... aaah, perhaps a future updateHandler getDocument() function could take a list of fields it should extract. Still problems with what to do when you add it.. maybe it checks if anything has changed in the less-modified index? I see your point. What are you thinking? Adding the processor as a parameter to AddUpdateCommand? I didn't have a clear alternative... I was just pointing out the future pitfalls of assuming too much implementation knowledge. I am fine either way -- in the UpdateHandler or the Processors. Request plumbing-wise, it feels the most natural in a processor. But if we rework the AddUpdateCommand it could fit there too. I don't know if it is an advantage or disadvantage to have the 'modify' parameters tied to the command or the parameters. either way has its +-, with no real winner (or loser) IMO In the end, I want to make sure that I never need a custom UpdateHandler (80% is greek to me), but can easily change the 'modify' logic. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-139) Support updateable/modifiable documents
On 13-Jul-07, at 1:53 PM, Yonik Seeley (JIRA) wrote: ... ParallelReader, where some fields are in one sub-index ... the processor would ask the updateHandler for the existing document - the updateHandler deals with getting it to/from the right place. The big reason you would use ParallelReader is to avoid touching the less-modified/bigger fields in one index when changing some of the other fields in the other index. I've pondered this a few times: it could be a huge win for highlighting apps, which can be stored-field-heavy. However, I wonder if there is something that I am missing: PR requires perfect synchro of lucene doc ids, no? If you update fields for a doc in one index, need not you (re-)store the fields in all other indices too, to keep the doc ids in sync? -mike
Re: [jira] Commented: (SOLR-139) Support updateable/modifiable documents
On 7/13/07, Mike Klaas [EMAIL PROTECTED] wrote: ... ParallelReader, where some fields are in one sub-index ... the processor would ask the updateHandler for the existing document - the updateHandler deals with getting it to/from the right place. The big reason you would use ParallelReader is to avoid touching the less-modified/bigger fields in one index when changing some of the other fields in the other index. I've pondered this a few times: it could be a huge win for highlighting apps, which can be stored-field-heavy. However, I wonder if there is something that I am missing: PR requires perfect synchro of lucene doc ids, no? If you update fields for a doc in one index, need not you (re-)store the fields in all other indices too, to keep the doc ids in sync? Well, it would be tricky... one PR usecase would be to entirely re-index one field (in it's own separate index) thus maintaining synchronization with the main index. As Doug said ParallelReader was not really designed to support incremental updates of fields, but rather to accellerate batch updates. For incremental updates you're probably better served by updating a single index. That's probably not too useful for a general purpose platform like Solr. Another way to support a more incremental model is perhaps to split up the smaller volatile index into many segments so that updating a single doc involves rewriting just that segment. There might also be possibilities in different types of IndexReader implementations: one could map docids to maintain synchronization. This brings up a slightly different problem that lucene scorers expect to go in docid order. -Yonik
Re: nightly builds / solrj-lib
: http://people.apache.org/builds/lucene/solr/nightly/ : : the dist directory does not include the .jar files needed for solrj. : Can we modify the script to include 'solrj-lib'? if it's not in the nightly releases, then they won't make it into the official releases either -- the nightly.sh just renames the standard tgz/zip release files. it looks like the problem is that the tarfileset used by the package target only includes jar and war files from dist (not subdirs) ... i dont' see any reason why it shouldn't include dist/* ... so try changing thta and see if the artifacts from ant package start including the solrj stuff. -Hoss