[jira] Commented: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser
[ https://issues.apache.org/jira/browse/SOLR-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850508#action_12850508 ] Otis Gospodnetic commented on SOLR-896: --- This looks super straight forward. The only problem is that Qsol itself seems to be gone. Mark, any way you can put Qsol somewhere? Maybe just attach the Jar to this issue? Solr Query Parser Plugin for Mark Miller's Qsol Parser -- Key: SOLR-896 URL: https://issues.apache.org/jira/browse/SOLR-896 Project: Solr Issue Type: New Feature Components: search Reporter: Chris Harris Attachments: SOLR-896.patch, SOLR-896.patch An extremely basic plugin to get the Qsol query parser (http://www.myhardshadow.com/qsol.php) working in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: lucene and solr trunk
+1 for this structure and this set of steps. Otis - Original Message From: Chris Hostetter hossman_luc...@fucit.org To: solr-dev@lucene.apache.org Sent: Tue, March 16, 2010 6:46:19 PM Subject: Re: lucene and solr trunk : Otis, yes, I think so, eventually. But that's gonna take much more discussion. : : I don't think this initial cutover should try to solve how modules : will be organized, yet... we'll get there, eventually. But we should at least consider it, and not move in a direction that's distinct from the ultimate goal of better refactoring (especailly since that was one of the main goals of unifying development efforts) Here's my concrete suggestion that could be done today (for simplicity: $svn = target=_blank https://svn.apache.org/repos/asf/lucene)... svn mv $svn/java/trunk $svn/java/tmp-migration svn mkdir $svn/java/trunk svn mv $svn/solr/trunk $svn/java/trunk/solr svn mv $svn/java/tmp-migration $svn/java/trunk/core At which point: 0. People who want to work only on Lucene-Java can start checking out $svn/java/trunk/core (i'm pretty sure existing checkouts will continue to work w/o any changes, the svn info should just update itself) 1. build files can be added to (the new) $svn/java/trunk to build ./core followed by ./solr 2. the build files in $svn/java/trunk/solr can be modified to look at ../core/ to find lucene jars 3. people who care about Solr (including all committers) should start checking out and building all of $svn/java/trunk 4. Long term, we could choose to branch all of $svn/java/trunk for releases ... AND/OR we could choose to branch specific modules (ie: solr) independently (with modifications to the build files on those branches to pull in their dependencies from alternate locations 5. Long term, we can start refactoring additional modules out of $svn/java/trunk/solr and $svn/java/trunk/core (like $svn/java/trunk/core/contrib) into their own directory in $svn/java/trunk 6. Long term, people who want to work on more then just core but don't care about certain modules (like solr) can do a simple non-recursive checkout of $svn/java/trunk and then do full checkouts of whatever modules they care about (Please note: I'm just trying to list things we *could* do if we go this route, i'm not advocating that we *should* do any of these things) I can't think of any objections people have raised to any of the previous suggestions which apply to this suggestion. Is there anything people can think of that would be useful, but not possible, if we go this route? -Hoss
[jira] Commented: (SOLR-1822) SEVERE: Unable to move index file from: tempfile to: indexfile
[ https://issues.apache.org/jira/browse/SOLR-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846099#action_12846099 ] Otis Gospodnetic commented on SOLR-1822: When Solr starts, doesn't it create the index directory? If so, this patch is not needed, unless we want to make sure replication succeeds even if someone/something removes the whole index directory on a slave after the slave had already started. Is this rally needed? SEVERE: Unable to move index file from: tempfile to: indexfile -- Key: SOLR-1822 URL: https://issues.apache.org/jira/browse/SOLR-1822 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: Linux, JDK6,SOLR 1.4 Reporter: wyhw whon Priority: Critical Fix For: 1.5 Attachments: SnapPuller.patch SOLR index directory remvoed,but do not know what the reasons for this. I add some codes on SnapPuller.java 577 line can reslove this bug. line 576 File indexFileInIndex = new File(indexDir, fname); + if (!indexDir.exists()) indexDir.mkdir(); boolean success = indexFileInTmpDir.renameTo(indexFileInIndex); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1375) BloomFilter on a field
[ https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846139#action_12846139 ] Otis Gospodnetic commented on SOLR-1375: Heh, with the Lucene/Solr merge taking place now, my previous comment above makes even more sense. What do you think? BloomFilter on a field -- Key: SOLR-1375 URL: https://issues.apache.org/jira/browse/SOLR-1375 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch Original Estimate: 120h Remaining Estimate: 120h * A bloom filter is a read only probabilistic set. Its useful for verifying a key exists in a set, though it returns false positives. http://en.wikipedia.org/wiki/Bloom_filter * The use case is indexing in Hadoop and checking for duplicates against a Solr cluster (which when using term dictionary or a query) is too slow and exceeds the time consumed for indexing. When a match is found, the host, segment, and term are returned. If the same term is found on multiple servers, multiple results are returned by the distributed process. (We'll need to add in the core name I just realized). * When new segments are created, and commit is called, a new bloom filter is generated from a given field (default:id) by iterating over the term dictionary values. There's a bloom filter file per segment, which is managed on each Solr shard. When segments are merged away, their corresponding .blm files is also removed. In a future version we'll have a central server for the bloom filters so we're not abusing the thread pool of the Solr proxy and the networking of the Solr cluster (this will be done sooner than later after testing this version). I held off because the central server requires syncing the Solr servers' files (which is like reverse replication). * The patch uses the BloomFilter from Hadoop 0.20. I want to jar up only the necessary classes so we don't have a giant Hadoop jar in lib. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html * Distributed code is added and seems to work, I extended TestDistributedSearch to test over multiple HTTP servers. I chose this approach rather than the manual method used by (for example) TermVectorComponent.testDistributed because I'm new to Solr's distributed search and wanted to learn how it works (the stages are confusing). Using this method, I didn't need to setup multiple tomcat servers and manually execute tests. * We need more of the bloom filter options passable via solrconfig * I'll add more test cases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: lucene and solr trunk
Hi, Check out the dir structure mentioned here: http://markmail.org/message/gwpmaevw7tavqqge Isn't that what we want? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Mark Miller markrmil...@gmail.com To: solr-dev@lucene.apache.org Sent: Mon, March 15, 2010 11:43:48 PM Subject: Re: lucene and solr trunk On 03/15/2010 11:28 PM, Yonik Seeley wrote: So, we have a few options on where to put Solr's new trunk: Solr moves to Lucene's trunk: /java/trunk, /java/trunk/sol +1. With the goal of merged dev, merged tests, this looks the best to me. Simple to do patches that span both, simple to setup Solr to use Lucene trunk rather than jars. Short paths. Simple. I like it. -- - Mark href=http://www.lucidimagination.com; target=_blank http://www.lucidimagination.com
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839907#action_12839907 ] Otis Gospodnetic commented on SOLR-1553: What does u in uf stand for? extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 1.5 Attachments: edismax.unescapedcolon.bug.test.patch, edismax.userFields.patch, SOLR-1553.patch, SOLR-1553.pf-refactor.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1375) BloomFilter on a field
[ https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838446#action_12838446 ] Otis Gospodnetic commented on SOLR-1375: {quote} When new segments are created, and commit is called, a new bloom filter is generated from a given field (default:id) by iterating over the term dictionary values. There's a bloom filter file per segment, which is managed on each Solr shard. When segments are merged away, their corresponding .blm files is also removed. {quote} Doesn't this hint at some of this stuff (haven't looked at the patch) really needing to live in Lucene index segment files merging land? BloomFilter on a field -- Key: SOLR-1375 URL: https://issues.apache.org/jira/browse/SOLR-1375 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch Original Estimate: 120h Remaining Estimate: 120h * A bloom filter is a read only probabilistic set. Its useful for verifying a key exists in a set, though it returns false positives. http://en.wikipedia.org/wiki/Bloom_filter * The use case is indexing in Hadoop and checking for duplicates against a Solr cluster (which when using term dictionary or a query) is too slow and exceeds the time consumed for indexing. When a match is found, the host, segment, and term are returned. If the same term is found on multiple servers, multiple results are returned by the distributed process. (We'll need to add in the core name I just realized). * When new segments are created, and commit is called, a new bloom filter is generated from a given field (default:id) by iterating over the term dictionary values. There's a bloom filter file per segment, which is managed on each Solr shard. When segments are merged away, their corresponding .blm files is also removed. In a future version we'll have a central server for the bloom filters so we're not abusing the thread pool of the Solr proxy and the networking of the Solr cluster (this will be done sooner than later after testing this version). I held off because the central server requires syncing the Solr servers' files (which is like reverse replication). * The patch uses the BloomFilter from Hadoop 0.20. I want to jar up only the necessary classes so we don't have a giant Hadoop jar in lib. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html * Distributed code is added and seems to work, I extended TestDistributedSearch to test over multiple HTTP servers. I chose this approach rather than the manual method used by (for example) TermVectorComponent.testDistributed because I'm new to Solr's distributed search and wanted to learn how it works (the stages are confusing). Using this method, I didn't need to setup multiple tomcat servers and manually execute tests. * We need more of the bloom filter options passable via solrconfig * I'll add more test cases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1788) Remove duplicate field in schema.xml
[ https://issues.apache.org/jira/browse/SOLR-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-1788. Resolution: Won't Fix Please email questions to solr-user list. Remove duplicate field in schema.xml Key: SOLR-1788 URL: https://issues.apache.org/jira/browse/SOLR-1788 Project: Solr Issue Type: New Feature Reporter: Bill Bell Is there a way to remove duplicates in a multiValue field? For example if I add the following - is there a way to remove the duplicates? If not directly in schema.xml how about in DIH? arr name=options strFull Bathrooms = 2/str strBedrooms = 2/str strBedrooms = 2/str strFull Bathrooms = 2/str strProperty Address = Orange,92805/str strProperty Type = Apartments/str /arr This would be changed to: arr name=options strBedrooms = 2/str strFull Bathrooms = 2/str strProperty Address = Orange,92805/str strProperty Type = Apartments/str /arr -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1719) stock TokenFilterFactory for flattening positions
[ https://issues.apache.org/jira/browse/SOLR-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802342#action_12802342 ] Otis Gospodnetic commented on SOLR-1719: Does PositionFilterFactory fix the problem? stock TokenFilterFactory for flattening positions - Key: SOLR-1719 URL: https://issues.apache.org/jira/browse/SOLR-1719 Project: Solr Issue Type: Wish Reporter: Hoss Man People seem to occasionally be confused by why certain inputs result in PhraseQueries instead of BooleanQueries... http://old.nabble.com/Understanding-the-query-parser-to27071483.html http://old.nabble.com/Tokenizer-question-to27099119.html ...it would probably be handy if there was a TokenFilterFactory provided out of the box that just set the positionIncrement of every token to 0 to deal with situations where people don't care about term positions at query time, and are just using tokenization/analysis as a way to split up some input string into multiple SHOULD clauses for a BooleanQuery -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-577) added support for boosting fields and documents to python solr interface
[ https://issues.apache.org/jira/browse/SOLR-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-577. --- Resolution: Won't Fix Closing per comment. added support for boosting fields and documents to python solr interface Key: SOLR-577 URL: https://issues.apache.org/jira/browse/SOLR-577 Project: Solr Issue Type: Improvement Components: clients - python Environment: linux, python Reporter: Rob Young Attachments: solr.py Added the ability to set boosts on fields and documents when indexing. This is done through two new classes solr.Document and solr.Field c = solr.SolrConnection(host='localhost:8081') c.add(id='123', name=solr.Field('this is a field', boost=1.5)) doc = solr.Document(boost=1.5) doc.add(solr.Field(name='title', value=a value for my field, boost=1.1)) c.addDoc(doc) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-216) Improvements to solr.py
[ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-216. --- Resolution: Won't Fix Closing per comment Improvements to solr.py --- Key: SOLR-216 URL: https://issues.apache.org/jira/browse/SOLR-216 Project: Solr Issue Type: Improvement Components: clients - python Affects Versions: 1.2 Reporter: Jason Cater Assignee: Mike Klaas Priority: Trivial Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py I've taken the original solr.py code and extended it to include higher-level functions. * Requires python 2.3+ * Supports SSL (https://) schema * Conforms (mostly) to PEP 8 -- the Python Style Guide * Provides a high-level results object with implicit data type conversion * Supports batching of update commands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-758) Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.
[ https://issues.apache.org/jira/browse/SOLR-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800816#action_12800816 ] Otis Gospodnetic commented on SOLR-758: --- I this still needed with enhanced dismax now available? Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies. - Key: SOLR-758 URL: https://issues.apache.org/jira/browse/SOLR-758 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Reporter: David Smiley Fix For: 1.5 Attachments: AdvancedQParserPlugin.java, AdvancedQParserPlugin.java, DisMaxQParserPlugin.java, DisMaxQParserPlugin.java, UserQParser.java, UserQParser.java, UserQParser.java-umlauts.patch The DisMaxQParserPlugin has a variety of nice features; chief among them is that is uses the DisjunctionMaxQueryParser. However it imposes limitations on the syntax. I've enhanced the DisMax QParser plugin to use a pluggable query string re-writer (via subclass extension) instead of hard-coding the logic currently embedded within it (i.e. the escape nearly everything logic). Additionally, I've made this QParser have a notion of a simple syntax (the default) or non-simple in which case some of the logic in this QParser doesn't occur because it's irrelevant (phrase boosting and min-should-max in particular). As part of my work I significantly moved the code around to make it clearer and more extensible. I also chose to rename it to suggest it's role as a parser for user queries. Attachment to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793300#action_12793300 ] Otis Gospodnetic commented on SOLR-773: --- Dave - useful, thanks! Do you think creating/editing a Wiki page with this information would be good? See: http://wiki.apache.org/solr/LocalSolr Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789379#action_12789379 ] Otis Gospodnetic commented on SOLR-1632: I didn't look a the patch, but from your comments it looks like you already have that 1 merged big idf map, which is really what I was aiming at, so that's good! I was just thinking that this map (file) would be periodically updated and pushed to slaves, so that slaves can compute the global IDF *locally* instead of any kind of extra requests. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789120#action_12789120 ] Otis Gospodnetic commented on SOLR-1632: What about this approach: http://markmail.org/message/mjfmpzfspguepixx ? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785694#action_12785694 ] Otis Gospodnetic commented on SOLR-1277: How about this idea for the what to do with the default core name. What if the default/empty-named core always pointed to the Solr admin/dashboard page, something that shows all the info about the system (pulled from ZK)? Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776661#action_12776661 ] Otis Gospodnetic commented on SOLR-1553: I think you need to click on Issue Links link, delete, and re-link. I have a feeling once this is in, people won't need the original dismax. Yonik, did you mean to attach a patch, but forgot? extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 1.5 An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1550) statistics for request handlers should report std dev
[ https://issues.apache.org/jira/browse/SOLR-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775163#action_12775163 ] Otis Gospodnetic commented on SOLR-1550: Haven't tried the patch yet, just had a quick look at it . in a browser. It looks like it has tabs? (should be replaced by 2 spaces) Thanks! statistics for request handlers should report std dev - Key: SOLR-1550 URL: https://issues.apache.org/jira/browse/SOLR-1550 Project: Solr Issue Type: Improvement Reporter: Mike Anderson Priority: Trivial Attachments: SOLR-1550.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score
[ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774053#action_12774053 ] Otis Gospodnetic commented on SOLR-1537: The ID here being the uniqueKey? i.e. the use case is the removal of dupes when the same document is indexed in multiple shards and more than 1 shard return that document in the result set? Dedupe Sharded Search Results by Shard Order or Score - Key: SOLR-1537 URL: https://issues.apache.org/jira/browse/SOLR-1537 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4, 1.5 Environment: All Reporter: Dennis Kubes Fix For: 1.5 Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score. Allows the result returned to be deterministic. If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later. If by score then higher scores beat out lower scores. This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1536) Support for TokenFilters that may modify input documents
[ https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774057#action_12774057 ] Otis Gospodnetic commented on SOLR-1536: Is this better than writing a custom UpdateRequestProcessor that takes the value of the incoming SolrInputDocument (SID), does something to it, removes the original field, and adds the modified version back to SID? Support for TokenFilters that may modify input documents Key: SOLR-1536 URL: https://issues.apache.org/jira/browse/SOLR-1536 Project: Solr Issue Type: New Feature Components: Analysis Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: altering.patch In some scenarios it's useful to be able to create or modify fields in the input document based on analysis of other fields of this document. This need arises e.g. when indexing multilingual documents, or when doing NLP processing such as NER. However, currently this is not possible to do. This issue provides an implementation of this functionality that consists of the following parts: * DocumentAlteringFilterFactory - abstract superclass that indicates that TokenFilter-s created from this factory may modify fields in a SolrInputDocument. * TypeAsFieldFilterFactory - example implementation that illustrates this concept, with a JUnit test. * DocumentBuilder modifications to support this functionality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Avro in Solr
Hello, Avro is still young, from what I know, but I'm wondering if anyone has any thoughts on whether there is a place or need for Avro in Solr? http://www.cloudera.com/blog/2009/11/02/avro-a-format-for-big-data/ Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
Re: Avro in Solr
I don't know yet. Otis - Original Message From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com To: solr-dev@lucene.apache.org Sent: Tue, November 3, 2009 9:58:38 PM Subject: Re: Avro in Solr Structured formats have a lot of limitations when it comes to solr. The number and name of fields in any document is completely arbitrary in Solr. Is it possible to represent such a datastructure in avro? On Wed, Nov 4, 2009 at 3:43 AM, Otis Gospodnetic wrote: Hello, Avro is still young, from what I know, but I'm wondering if anyone has any thoughts on whether there is a place or need for Avro in Solr? http://www.cloudera.com/blog/2009/11/02/avro-a-format-for-big-data/ Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR -- - Noble Paul | Principal Engineer| AOL | http://aol.com
[jira] Resolved: (SOLR-1541) lowering ranking of certain documents while search
[ https://issues.apache.org/jira/browse/SOLR-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-1541. Resolution: Invalid lowering ranking of certain documents while search -- Key: SOLR-1541 URL: https://issues.apache.org/jira/browse/SOLR-1541 Project: Solr Issue Type: Wish Components: search Reporter: arvind The requirement is as below: Suppose, there are some documents already stored in Solr. These documents/records belong to various sources like, source1, source2 etc (stored in 'Source' Solr field). Now, when user searches for documents (simple text search) then, is there any possibilities in Solr so that results of certain sources always come with lower rank? (ie, such sources always come in trailing pages). I believe, there should be some way for this in functional query but not sure! Any help on this is greately appreciated. Thanks in advance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1533) Partition data directories into multiple bucket directories
[ https://issues.apache.org/jira/browse/SOLR-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771480#action_12771480 ] Otis Gospodnetic commented on SOLR-1533: Trying to understand the need for this (I might have missed the discussion on the ML?). Isn't the creator of the core in control of the data dir ( http://wiki.apache.org/solr/CoreAdmin#CREATE ) and thus their distribution? Or is the goal of this to remove the logic and knowledge from the client and let Solr control where core's data is going to be placed, depending on the core data distribution policy? Partition data directories into multiple bucket directories - Key: SOLR-1533 URL: https://issues.apache.org/jira/browse/SOLR-1533 Project: Solr Issue Type: New Feature Components: multicore Reporter: Shalin Shekhar Mangar Fix For: 1.5 Provide a way to partition data directories into multiple bucket directories. For example, instead of creating 10,000 data directories inside one base data directory, Solr can assign a core to one of 4 base directories, thereby distributing them. The underlying problem is that with large number of indexes, we see slower and slower system performance as one goes on increasing the number of cores, thereby increasing the number of directories in the single data directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1335) load core properties from a properties file
[ https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743310#action_12743310 ] Otis Gospodnetic commented on SOLR-1335: Mind including an example properties file, so we can see what's in it? load core properties from a properties file --- Key: SOLR-1335 URL: https://issues.apache.org/jira/browse/SOLR-1335 Project: Solr Issue Type: New Feature Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1335.patch, SOLR-1335.patch, SOLR-1335.patch There are few ways of loading properties in runtime, # using env property using in the command line # if you use a multicore drop it in the solr.xml if not , the only way is to keep separate solrconfig.xml for each instance. #1 is error prone if the user fails to start with the correct system property. In our case we have four different configurations for the same deployment . And we have to disable replication of solrconfig.xml. It would be nice if I can distribute four properties file so that our ops can drop the right one and start Solr. Or it is possible for the operations to edit a properties file but it is risky to edit solrconfig.xml if he does not understand solr I propose a properties file in the instancedir as solrcore.properties . If present would be loaded and added as core specific properties. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler
[ https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738793#action_12738793 ] Otis Gospodnetic commented on SOLR-1274: Try: {code} if (text.equals(extractFormat)) { {code} :) Provide multiple output formats in extract-only mode for tika handler - Key: SOLR-1274 URL: https://issues.apache.org/jira/browse/SOLR-1274 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1274.patch The proposed feature is to accept a URL parameter when using extract-only mode to specify an output format. This parameter might just overload the existing ext.extract.only so that one can optionally specify a format, e.g. false|true|xml|text where true and xml give the same response (i.e. xml remains the default) I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar: -x or --xmlOutput XHTML content (default) -h or --html Output HTML content -t or --text Output plain text content -m or --metadata Output only metadata However, looking at the docs and source, it seems that only the xml option is available (hard-coded) in ExtractingDocumentLoader.java {code} serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, true)); {code} Providing at least a plain-text response seems to work if you change the serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: trie fields default in example schema
Would it make sense to instead add new tint(eger) type instead of renaming integer to pinteger? (thinking about people upgrading to Solr 1.4). Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-dev@lucene.apache.org Sent: Sunday, August 2, 2009 3:01:09 PM Subject: trie fields default in example schema I'm working on a jumbo trie patch (just many smaller trie related issues at once) - SOLR-1288. Anyway, I think support will be good enough for 1.4 that we should make types like integer in the example schema be based on the trie fields. The current integer fields should be renamed to pinteger (for plain integer), and have a recommended use only for compatibility with other/older indexes. People have mistakenly used the plain integer in the past based on the name, so I think we should fix the naming. The trie based fields should have lower memory footprint in the fieldcache and are faster for a lookup (the only reason to use plain ints in the past)... sint uses StringIndex for historical reasons - we had no other option... we could upgrade the existing sint fields, but it wouldn't be quite 100% compatible and there's little reason since we have the trie fields now. -Yonik http://www.lucidimagination.com
[jira] Commented: (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737838#action_12737838 ] Otis Gospodnetic commented on SOLR-1293: Do you have any thoughts on handling the situation where each core belongs to a different party and each party has access *only* to its own core via Solr Admin (i.e. doesn't see all the other cores hosted by the instance)? Only the privileged administrator user can see and access all cores. Have you done any work in on this or is this on your TODO? Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737844#action_12737844 ] Otis Gospodnetic commented on SOLR-1293: OK, thanks. When you go to your Solr Admin page today, it lists all cores, even if there are 1 of them? Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic reassigned SOLR-908: - Assignee: Shalin Shekhar Mangar I won't get to it before going on vacation. Assigning to you if you want it. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1305) Notification based replication instead of polling
[ https://issues.apache.org/jira/browse/SOLR-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734498#action_12734498 ] Otis Gospodnetic commented on SOLR-1305: With a little help from Zookeeper, is that the plan? Notification based replication instead of polling - Key: SOLR-1305 URL: https://issues.apache.org/jira/browse/SOLR-1305 Project: Solr Issue Type: New Feature Components: replication (java) Reporter: Noble Paul Fix For: 1.5 Currently the only way for the slave to know about the availability of of new commit points is by polling. This means slaves should 'poll' very frequently to ensure that it gets the commit point immediately. if the changes to the master is less frequent, then this can be an unnecessary overhead. If would be nice if the slave can register itself with the master for notification on availability of new changes. After receiving the notification , the slave can trigger a poll and do what it does now. This may require SOLR-727 so that the slave can register its url with the master -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1303) Wildcard queries on fields with LowerCaseFilterFactory not being lowercased.
[ https://issues.apache.org/jira/browse/SOLR-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-1303. Resolution: Invalid I think that's due to wildcard queries not being analyzed (and thus lowercased to match your indexed tokens). Explanation is in the Lucene FAQ Wiki page. Wildcard queries on fields with LowerCaseFilterFactory not being lowercased. Key: SOLR-1303 URL: https://issues.apache.org/jira/browse/SOLR-1303 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Matt Schraeder Priority: Minor I have a field defined as follows: fieldType name=keyword class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType field name=reviews type=keyword index=true stored=true multiValued=true / The data being index is a single letter followed by a space, a +,-,M, or A ... so basically two characters. When I do the following queries: reviews: K+ reviews: k+ I get results as expected. However, when I replace the + in the query with a * or ?, then the uppercase version no longer works, only the lowercase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1304) Make it possible to force replication of at least some of the config files even if the index hasn't changed
Make it possible to force replication of at least some of the config files even if the index hasn't changed --- Key: SOLR-1304 URL: https://issues.apache.org/jira/browse/SOLR-1304 Project: Solr Issue Type: Improvement Components: replication (java) Reporter: Otis Gospodnetic Priority: Minor Fix For: 1.5 From http://markmail.org/thread/vpk2fsjns7u2uopd Here is a use case: * Index is mostly static (nightly updates) * elevate.xml needs to be changed throughout the day * elevate.xml needs to be pushed to slaves and solr needs to reload it This is currently not possible because replication will happen only if the index changed in some way. You can't force a commit to fake index change. So one has to either: * add/delete dummy docs on master to force index change * write an external script that copies the config file to slaves -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1304) Make it possible to force replication of at least some of the config files even if the index hasn't changed
[ https://issues.apache.org/jira/browse/SOLR-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734332#action_12734332 ] Otis Gospodnetic commented on SOLR-1304: From Paul: +1 We should have a separate attributes in the master other than the standard str name=confFilesa.xml/str say str name=realTimeConfFilesb.xml/str the files specified in this can be replicated always irrespective of the index Make it possible to force replication of at least some of the config files even if the index hasn't changed --- Key: SOLR-1304 URL: https://issues.apache.org/jira/browse/SOLR-1304 Project: Solr Issue Type: Improvement Components: replication (java) Reporter: Otis Gospodnetic Priority: Minor Fix For: 1.5 From http://markmail.org/thread/vpk2fsjns7u2uopd Here is a use case: * Index is mostly static (nightly updates) * elevate.xml needs to be changed throughout the day * elevate.xml needs to be pushed to slaves and solr needs to reload it This is currently not possible because replication will happen only if the index changed in some way. You can't force a commit to fake index change. So one has to either: * add/delete dummy docs on master to force index change * write an external script that copies the config file to slaves -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1304) Make it possible to force replication of at least some of the config files even if the index hasn't changed
[ https://issues.apache.org/jira/browse/SOLR-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734334#action_12734334 ] Otis Gospodnetic commented on SOLR-1304: Would it make more sense for the caller to specify (in the request) which files to replicate, thus giving it full control over what to replicate when? Maybe the realTimeConfFiles should then not list all conf files that should always be replicated, but instead list all the conf files that are *allowed* to be replicated when the caller request some of them to be replicated? Make it possible to force replication of at least some of the config files even if the index hasn't changed --- Key: SOLR-1304 URL: https://issues.apache.org/jira/browse/SOLR-1304 Project: Solr Issue Type: Improvement Components: replication (java) Reporter: Otis Gospodnetic Priority: Minor Fix For: 1.5 From http://markmail.org/thread/vpk2fsjns7u2uopd Here is a use case: * Index is mostly static (nightly updates) * elevate.xml needs to be changed throughout the day * elevate.xml needs to be pushed to slaves and solr needs to reload it This is currently not possible because replication will happen only if the index changed in some way. You can't force a commit to fake index change. So one has to either: * add/delete dummy docs on master to force index change * write an external script that copies the config file to slaves -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Forcing config replication without index change
OK, I'll create a JIRA for 1.5 tomorrow. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mark Miller markrmil...@gmail.com To: solr-dev@lucene.apache.org Sent: Thursday, July 16, 2009 11:37:52 AM Subject: Re: Forcing config replication without index change bq. Shouldn't it be possible to force replication of at least *some* of the config files even if the index hasn't changed? Indeed. Perhaps another call? forceIndexFetch? it replicates configs whether the index has changed or not, but wouldn't replicate the index if it didn't need to? Or a separate call altogether? fetchConfig, that just updates the configs? On Thu, Jul 16, 2009 at 3:00 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Shouldn't it be possible to force replication of at least *some* of the config files even if the index hasn't changed? (see Paul Noble's comment on http://markmail.org/message/hgdwumfuuwixfxvqand the 4-message thread) Here is a use case: * Index is mostly static (nightly updates) * elevate.xml needs to be changed throughout the day * elevate.xml needs to be pushed to slaves and solr needs to reload it This is currently not possible because replication will happen only if the index changed in some way. You can't force a commit to fake index change. So one has to either: * add/delete dummy docs on master to force index change * write an external script that copies the config file to slaves Shouldn't it be possible to force replication of at least *some* of the config files even if the index hasn't changed? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -- -- - Mark http://www.lucidimagination.com
Forcing config replication without index change
Hi, Shouldn't it be possible to force replication of at least *some* of the config files even if the index hasn't changed? (see Paul Noble's comment on http://markmail.org/message/hgdwumfuuwixfxvq and the 4-message thread) Here is a use case: * Index is mostly static (nightly updates) * elevate.xml needs to be changed throughout the day * elevate.xml needs to be pushed to slaves and solr needs to reload it This is currently not possible because replication will happen only if the index changed in some way. You can't force a commit to fake index change. So one has to either: * add/delete dummy docs on master to force index change * write an external script that copies the config file to slaves Shouldn't it be possible to force replication of at least *some* of the config files even if the index hasn't changed? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
[jira] Commented: (SOLR-1041) dataDir is not set relative to instanceDir
[ https://issues.apache.org/jira/browse/SOLR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731994#action_12731994 ] Otis Gospodnetic commented on SOLR-1041: I worked around it by using the relative directory in instanceDir instead of using the absolute directory. I think one should able to use either an absolute or a relative directory. If it matter, note that I don't have dataDir in cores' solrconfig.xml files or in solr.xml, so Solr uses defaults (data/) for that. dataDir is not set relative to instanceDir --- Key: SOLR-1041 URL: https://issues.apache.org/jira/browse/SOLR-1041 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: SOLR-1041.patch, SOLR-1041.patch see the mail thread. http://markmail.org/thread/ebd7vumj3uyzpyt6 A recent bug fix has broken the feature. Now it is always relative to current working directory for single core -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2
[ https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731999#action_12731999 ] Otis Gospodnetic commented on SOLR-1275: Patch looks good to me (also not tested it) Add expungeDeletes to DirectUpdateHandler2 -- Key: SOLR-1275 URL: https://issues.apache.org/jira/browse/SOLR-1275 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Jason Rutherglen Assignee: Noble Paul Priority: Trivial Fix For: 1.4 Attachments: SOLR-1275.patch Original Estimate: 48h Remaining Estimate: 48h expungeDeletes is a useful method somewhat like optimize is offered by IndexWriter that can be implemented in DirectUpdateHandler2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-1192. Resolution: Fixed Should be taken care of with Lucene upgrade now. solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size --- Key: SOLR-1192 URL: https://issues.apache.org/jira/browse/SOLR-1192 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.3 Environment: any Reporter: viobade Assignee: Otis Gospodnetic Fix For: 1.4 If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731082#action_12731082 ] Otis Gospodnetic commented on SOLR-1192: LUCENE-1491 fix is in Lucene repository now, so as soon as we pull new Lucene jars into Solr, I'll mark this as fixed. Feel free to test with local copies of the Lucene nightly jars tomorrow and report back. solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size --- Key: SOLR-1192 URL: https://issues.apache.org/jira/browse/SOLR-1192 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.3 Environment: any Reporter: viobade Assignee: Otis Gospodnetic Fix For: 1.4 If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-862) Solr must declare crypto usage pending SOLR-284
[ https://issues.apache.org/jira/browse/SOLR-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731084#action_12731084 ] Otis Gospodnetic commented on SOLR-862: --- Did this already happen? Solr must declare crypto usage pending SOLR-284 --- Key: SOLR-862 URL: https://issues.apache.org/jira/browse/SOLR-862 Project: Solr Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Blocker Fix For: 1.4 Since Solr will be shipping Tika in 1.4, which uses PDFBox, which uses BouncyCastle, Solr must declare it's Crypto usage per ASF guidelines. See http://www.apache.org/dev/crypto.html and https://issues.apache.org/jira/browse/NUTCH-621 for references and examples of what to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Using Synonyms is actually narrowing the result set in some cases
Raj, could you please use the solr-user list for this? When reposting there, please include debugQuery=true output for both queries. Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: rajkovuru r...@elance.com To: solr-dev@lucene.apache.org Sent: Tuesday, July 7, 2009 3:41:13 PM Subject: Using Synonyms is actually narrowing the result set in some cases Hi, I recently introduced a small set of synonyms to be expanded at query time. I didn't and would not want to modify the index so applied synonyms to query. Synonyms match correctly and the query is expanded indeed, however in some cases , usually multi word synonyms the query is returning less results than it would without synonyms.. Any pointers to where the problem could be? Example: before synonyms search for crm about 1000 results synonyms implemented crm, customer relationship management search for crm about 200 results One would expect solr to return more results as a result of using synonyms but the effect is exactly opposite.. Thanks Rah -- View this message in context: http://www.nabble.com/Using-Synonyms-is-actually-narrowing-the-result-set-in-some-cases-tp24380034p24380034.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Re: subdirectories under lib
This sounds good to me and I like Yonik's idea, too. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Erik Hatcher e...@ehatchersolutions.com To: solr-dev@lucene.apache.org Sent: Monday, July 6, 2009 8:41:19 AM Subject: Re: subdirectories under lib Another option is to have a config option for the lib directories (plural) allowing multiple to be specified that can live anywhere, not just under solr-home. Erik On Jul 4, 2009, at 12:03 PM, Yonik Seeley wrote: How hard would it be to allow subdirectories under example/solr/lib? Seems like it would be nice to allow jars to be partitioned, so everything related to solr cell could be put under the solr/lib/solrcell directory. Then extracting request handler could be defined as lazy and we could simply tell people to remove solr/lib/solrcell if you don't need it. -Yonik http://www.lucidimagination.com
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12726209#action_12726209 ] Otis Gospodnetic commented on SOLR-908: --- Thanks Tom. TODOs are good reminders, so I'd say leave them. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1198) confine all solrconfig.xml parsing to SolrConfig.java
[ https://issues.apache.org/jira/browse/SOLR-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724581#action_12724581 ] Otis Gospodnetic commented on SOLR-1198: {quote} My real objective is to make it possible to start solr w/o a simgle line of xml. {quote} Could you elaborate please? Where would various configuration settings be specified? confine all solrconfig.xml parsing to SolrConfig.java - Key: SOLR-1198 URL: https://issues.apache.org/jira/browse/SOLR-1198 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch Currently , xpath evaluations are spread across Solr code. It would be cleaner if if can do it all in one place . All the parsing can be done in SolrConfig.java another problem with the current design is that we are not able to benefit from re-use of solrconfig object across cores. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1250) Seach the words having ampersand () symbol
[ https://issues.apache.org/jira/browse/SOLR-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-1250. Resolution: Invalid Please ask on solr-user list. Seach the words having ampersand () symbol --- Key: SOLR-1250 URL: https://issues.apache.org/jira/browse/SOLR-1250 Project: Solr Issue Type: Task Components: search Affects Versions: 1.3 Environment: Linux Reporter: Secpath Fix For: 1.3 Original Estimate: 24h Remaining Estimate: 24h I am indexing titles in my index.My titles can also have special characters like (+ - || ! ( ) { } [ ] ^ ~ * ? : \) When i am querieing the index to search with the matching titles , I am using the escape sequence '\' as per the doc http://lucene.apache.org/java/2_3_2/queryparsersyntax.html It looks fine for most the case except for when the title consists of the character '' or '' The query I used to search the index is as below in normal cases... http://myurl/solr/mdrs/select/?q=title:someTitle How do I search my index to get the titles like jakarta apache I tried by giving the below query http://myurl/solr/mdrs/select/?q=title:jakarta apache http://myurl/solr/mdrs/select/?q=title:jakarta apache http://myurl/solr/mdrs/select/?q=title:jakarta \ apache http://myurl/solr/mdrs/select/?q=title:jakarta \ apache Each of the above queries are giving errors... Unable to search my title jakarta apache Please let me know how can i search the words having ampersand () character -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: lucene releases vs trunk
I kind of agree... But will this (not) affect how quickly new features in Luceneland will get their Solr support? In other words, if we have to wait for a proper Lucene release, doesn't that mean that: 1) Solr releases will depend on Lucene releases (unless there are some Solr-only changes that don't depend on newer version of Lucene) 2) Solr releases will lag Lucene releases quite a bit because only after Lucene has been released Solr developers/contributors will be able to start work on integrating new Lucene features into Solr? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-dev@lucene.apache.org Sent: Thursday, June 25, 2009 7:18:31 AM Subject: lucene releases vs trunk For the next release cycle (presumably 1.5?) I think we should really try to stick to released versions of Lucene, and not use dev/trunk versions. Early in Solr's lifetime, Lucene trunk was more stable (APIs changed little, even on non-released versions), and Lucene releases were few and far between. Today, the pace of change in Lucene has quickened, and Lucene APIs are much more in flux until a release is made. It's also now harder to support a Lucene dev release given the growth in complexity (particularly for indexing code). Releases are made more often too, making using released versions more practical. Many of our users dislike our use of dev versions of Lucene too. And yes, 1.4 isn't out the door yet - but people often tend to hit the ground running on the next release. -Yonik http://www.lucidimagination.com
Re: lucene releases vs trunk
Hello, - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-dev@lucene.apache.org Sent: Thursday, June 25, 2009 1:41:39 PM Subject: Re: lucene releases vs trunk On Thu, Jun 25, 2009 at 1:29 PM, Chris Hostetterwrote: : This proposal was just for the next (1.5?) release cycle though. ... : I agree though - there is rapid movement in Lucene these days, and things can : be pulled back or altered fairly easily during trunk dev. Sometimes even index : format changing issues - which can be a real pain (having suffered that first : hand in the past). The closer we can stay to actual Lucene releases in : general, the better I think. I suggest we not worry about it too much until the situation arrises. I'm calling attention to it because I don't believe the move to 2.9-dev was ever discussed on solr-dev. AFAIK it was committed as part of SOLR-805... something I missed, and I doubt I'm the only one. The default should be to use released Lucene versions, and we should reluctantly move off of that. Once upon a time the decision to bump the lucene-java rev in Solr was drien largely based on wether we people that that version was had useful additions *and* was relatively solid. My impression more recently is that people have been bumping the rev primarily with the features/improvements in mind, and less consideration of the stability probably due to the (completely valid) assumption that solr trunk doesn't *need* to be any more stable then the lucene-java trunk, so we might as well go ahead and rev and help shake things out. Right - if we're relatively sure that a Lucene release is imminent (and will happen before a Solr release), it's not such a bad idea to upgrade. Aha, so this makes sense. Stick with the stable version until we see Lucene is preparing for a release. Then upgrade to the latest (nightly) Lucene and catch up with the goal of releasing Solr not too long after Lucene has been released. Like that? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723418#action_12723418 ] Otis Gospodnetic commented on SOLR-908: --- I took a super quick look and noticed: * not all classes have ASL (I think unit test classes need it, too) * Mentions of Copyright 2009, The Regents of The University of Michigan. I have a feeling this would need to be removed * @author and @version. I know we remove @author lines, and I'm not sure if @version is really desired Looks like a very thorough and complete patch, but I haven't tried it yet. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SNMP monitoring
Absolutely and thank you in advance! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Development Team dev.and...@gmail.com To: solr-dev@lucene.apache.org Sent: Thursday, June 18, 2009 6:48:26 AM Subject: Re: SNMP monitoring Hi devs, A while ago I posted a question to the solr-users list asking about SNMP monitoring of Solr. I got one reply suggesting the use of JMX-SNMP bridges, but upon researching these I could find a) nothing that seemed particularly good, and/or b) none of those that were free/OSS. Since then I've found that deploying Solr in JBoss/Jetty with the JBoss-SNMP SAR was the easiest way to get this job done. --But it still wasn't easy. Thus, my question is; would anybody like to me write up a Solr-Wiki page on how to expose Solr stats through SNMP? It's a bit involved, and is JBoss-specific, however it is a useful feature that other Solr users may benefit from. Let me know. - Daryl. On Wed, Apr 15, 2009 at 3:18 PM, Development Team wrote: Hi everybody, How would I set up SNMP monitoring of my Solr server? I've done some searching of the wiki and Google and have come up with a blank. Any pointers? - Daryl.
[jira] Resolved: (SOLR-1100) Typo fixes for solrjs docs
[ https://issues.apache.org/jira/browse/SOLR-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-1100. Resolution: Fixed Fix Version/s: 1.4 Assignee: Otis Gospodnetic Thanks Eric. Sendingjavascript/src/clientside/AutocompleteWidget.js Sendingjavascript/src/clientside/CalendarWidget.js Sendingjavascript/src/clientside/FacetWidget.js Sendingjavascript/src/clientside/TagcloudWidget.js Sendingjavascript/src/core/AbstractServerSideWidget.js Transmitting file data . Committed revision 786134. Typo fixes for solrjs docs -- Key: SOLR-1100 URL: https://issues.apache.org/jira/browse/SOLR-1100 Project: Solr Issue Type: Improvement Reporter: Eric Pugh Assignee: Otis Gospodnetic Priority: Minor Fix For: 1.4 Attachments: typos.patch Matthias suggested I put in a bug here for me small documentation fixes which were done against http://solrstuff.org/svn/solrjs/trunk/. Not sure if that is the latest or what is in the ASF solr contrib/javascript directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1189) Support basic auth
[ https://issues.apache.org/jira/browse/SOLR-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716621#action_12716621 ] Otis Gospodnetic commented on SOLR-1189: It would be good to have that in that example solrconfig.xml for people to see. Support basic auth -- Key: SOLR-1189 URL: https://issues.apache.org/jira/browse/SOLR-1189 Project: Solr Issue Type: New Feature Components: replication (java) Reporter: Matthew Gregg Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.4 Attachments: SOLR-1189.patch It would be extremely useful, if replication supported basic authentication. Currently a basic auth protected master/slave, cannot replicate. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1205) add a field alias feature
[ https://issues.apache.org/jira/browse/SOLR-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716623#action_12716623 ] Otis Gospodnetic commented on SOLR-1205: Am I the only person who finds that {!foo=bar} syntax very hard to parse and understand? add a field alias feature - Key: SOLR-1205 URL: https://issues.apache.org/jira/browse/SOLR-1205 Project: Solr Issue Type: New Feature Reporter: Noble Paul Fix For: 1.5 A feature which is similar to the SQL 'as' can be helpful see the mail thread http://www.lucidimagination.com/search/document/63b63edc15092922/customizing_results#63b63edc15092922 it can be implemented as a separate request param say {code} fl.alias=from_name1:to_name1fl.alias=from_name2:to_name2 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1145) Patch to set IndexWriter.defaultInfoStream from solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716417#action_12716417 ] Otis Gospodnetic commented on SOLR-1145: I agree about this belonging to solrconfig.xml -- I bet 50% people use multicore. Patch to set IndexWriter.defaultInfoStream from solr.xml Key: SOLR-1145 URL: https://issues.apache.org/jira/browse/SOLR-1145 Project: Solr Issue Type: Improvement Reporter: Chris Harris Fix For: 1.4 Attachments: SOLR-1145.patch, SOLR-1145.patch Lucene IndexWriters use an infoStream to log detailed info about indexing operations for debugging purpose. This patch is an extremely simple way to allow logging this info to a file from within Solr: After applying the patch, set the new defaultInfoStreamFilePath attribute of the solr element in solr.xml to the path of the file where you'd like to save the logging information. Note that, in a multi-core setup, all cores will end up logging to the same infoStream log file. This may not be desired. (But it does justify putting the setting in solr.xml rather than solrconfig.xml.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-1192: --- That stems from Lucene, see LUCENE-1491. solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size --- Key: SOLR-1192 URL: https://issues.apache.org/jira/browse/SOLR-1192 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.3 Environment: any Reporter: viobade Fix For: 1.3 If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-1192: --- Fix Version/s: (was: 1.3) 1.4 solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size --- Key: SOLR-1192 URL: https://issues.apache.org/jira/browse/SOLR-1192 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.3 Environment: any Reporter: viobade Fix For: 1.4 If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-990) Add pid file to snapinstaller to skip script overruns, and recover from failure
[ https://issues.apache.org/jira/browse/SOLR-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-990. --- Resolution: Fixed Thank you, Dan! Sendingsrc/scripts/snapinstaller Transmitting file data . Committed revision 781069. Add pid file to snapinstaller to skip script overruns, and recover from failure --- Key: SOLR-990 URL: https://issues.apache.org/jira/browse/SOLR-990 Project: Solr Issue Type: Improvement Components: replication (scripts) Reporter: Dan Rosher Assignee: Otis Gospodnetic Priority: Minor Fix For: 1.4 Attachments: SOLR-990.patch, SOLR-990.patch, SOLR-990.patch, SOLR-990.patch The pid file will allow snapinstaller to be run as fast as possible without overruns. Also it will recover from a last failed run should an older snapinstaller process no longer be running. Avoiding overruns means that snapinstaller can be run as fast as possible, but without suffering from the performance issue described here: http://wiki.apache.org/solr/SolrPerformanceFactors#head-fc7f22035c493431d58c5404ab22aef0ee1b9909 This means that one can do the following */1 * * * * /bin/snappuller/bin/snapinstaller Even with a 'properly tuned' setup, there can be times where snapinstaller can suffer from overruns due to a lack of resources, or an unoptimized index using more resources etc. currently the pid will live in /tmp ... perhaps it should be in the logs dir? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: update Lucene
Clearly I meant ...along with *Lucene* jars :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-dev@lucene.apache.org Sent: Wednesday, May 27, 2009 11:59:18 PM Subject: Re: update Lucene I wonder if it would be useful to commit Lucene's CHANGES.txt into Solr along with Solr jars. It would then be very easy to tell what changed in Lucene since the version Solr has and the current version of Lucene (or some newer released version, if we were able to be behind). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yonik Seeley To: solr-dev@lucene.apache.org Sent: Wednesday, May 27, 2009 4:58:39 PM Subject: update Lucene I think we should upgrade Lucene again since the index file format has changed: https://issues.apache.org/jira/browse/LUCENE-1654 This also contains a fix for unifying the FieldCache and ExtendedFieldCache instances. $ svn diff -r r776177 CHANGES.txt Index: CHANGES.txt === --- CHANGES.txt(revision 776177) +++ CHANGES.txt(working copy) @@ -27,7 +27,11 @@ implement Searchable or extend Searcher, you should change you code to implement this method. If you already extend IndexSearcher, no further changes are needed to use Collector. -(Shai Erera via Mike McCandless) + +Finally, the values Float.Nan, Float.NEGATIVE_INFINITY and +Float.POSITIVE_INFINITY are not valid scores. Lucene uses these +values internally in certain places, so if you have hits with such +scores it will cause problems. (Shai Erera via Mike McCandless) Changes in runtime behavior @@ -107,10 +111,10 @@ that's visited. All core collectors now use this API. (Mark Miller, Mike McCandless) -8. LUCENE-1546: Add IndexReader.flush(String commitUserData), allowing - you to record an opaque commitUserData into the commit written by - IndexReader. This matches IndexWriter's commit methods. (Jason - Rutherglen via Mike McCandless) +8. LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing + you to record an opaque commitUserData (maps String - String) into + the commit written by IndexReader. This matches IndexWriter's + commit methods. (Jason Rutherglen via Mike McCandless) 9. LUCENE-652: Added org.apache.lucene.document.CompressionTools, to enable compressing decompressing binary content, external to @@ -135,6 +139,9 @@ not make sense for all subclasses of MultiTermQuery. Check individual subclasses to see if they support #getTerm(). (Mark Miller) +14. LUCENE-1636: Make TokenFilter.input final so it's set only +once. (Wouter Heijke, Uwe Schindler via Mike McCandless). + Bug fixes 1. LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals() @@ -176,6 +183,9 @@ sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs. when it wasn't). (Shai Erera via Michael McCandless) +10. LUCENE-1647: Fix case where IndexReader.undeleteAll would cause +the segment's deletion count to be incorrect. (Mike McCandless) + New features 1. LUCENE-1411: Added expert API to open an IndexWriter on a prior @@ -186,10 +196,11 @@ when building transactional support on top of Lucene. (Mike McCandless) - 2. LUCENE-1382: Add an optional arbitrary String commitUserData to -IndexWriter.commit(), which is stored in the segments file and is -then retrievable via IndexReader.getCommitUserData instance and -static methods. (Shalin Shekhar Mangar via Mike McCandless) + 2. LUCENE-1382: Add an optional arbitrary Map (String - String) +commitUserData to IndexWriter.commit(), which is stored in the +segments file and is then retrievable via +IndexReader.getCommitUserData instance and static methods. +(Shalin Shekhar Mangar via Mike McCandless) 3. LUCENE-1406: Added Arabic analyzer. (Robert Muir via Grant Ingersoll) @@ -311,6 +322,10 @@ 25. LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take deletions into account when considering merges. (Yasuhiro Matsuda via Mike McCandless) + +26. LUCENE-1550: Added new n-gram based String distance measure for spell checking. +See the Javadocs for NGramDistance.java for a reference paper on why this is helpful (Tom Morton via Grant Ingersoll) + Optimizations -Yonik http://www.lucidimagination.com
Re: update Lucene
I wonder if it would be useful to commit Lucene's CHANGES.txt into Solr along with Solr jars. It would then be very easy to tell what changed in Lucene since the version Solr has and the current version of Lucene (or some newer released version, if we were able to be behind). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-dev@lucene.apache.org Sent: Wednesday, May 27, 2009 4:58:39 PM Subject: update Lucene I think we should upgrade Lucene again since the index file format has changed: https://issues.apache.org/jira/browse/LUCENE-1654 This also contains a fix for unifying the FieldCache and ExtendedFieldCache instances. $ svn diff -r r776177 CHANGES.txt Index: CHANGES.txt === --- CHANGES.txt(revision 776177) +++ CHANGES.txt(working copy) @@ -27,7 +27,11 @@ implement Searchable or extend Searcher, you should change you code to implement this method. If you already extend IndexSearcher, no further changes are needed to use Collector. -(Shai Erera via Mike McCandless) + +Finally, the values Float.Nan, Float.NEGATIVE_INFINITY and +Float.POSITIVE_INFINITY are not valid scores. Lucene uses these +values internally in certain places, so if you have hits with such +scores it will cause problems. (Shai Erera via Mike McCandless) Changes in runtime behavior @@ -107,10 +111,10 @@ that's visited. All core collectors now use this API. (Mark Miller, Mike McCandless) -8. LUCENE-1546: Add IndexReader.flush(String commitUserData), allowing - you to record an opaque commitUserData into the commit written by - IndexReader. This matches IndexWriter's commit methods. (Jason - Rutherglen via Mike McCandless) +8. LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing + you to record an opaque commitUserData (maps String - String) into + the commit written by IndexReader. This matches IndexWriter's + commit methods. (Jason Rutherglen via Mike McCandless) 9. LUCENE-652: Added org.apache.lucene.document.CompressionTools, to enable compressing decompressing binary content, external to @@ -135,6 +139,9 @@ not make sense for all subclasses of MultiTermQuery. Check individual subclasses to see if they support #getTerm(). (Mark Miller) +14. LUCENE-1636: Make TokenFilter.input final so it's set only +once. (Wouter Heijke, Uwe Schindler via Mike McCandless). + Bug fixes 1. LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals() @@ -176,6 +183,9 @@ sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs. when it wasn't). (Shai Erera via Michael McCandless) +10. LUCENE-1647: Fix case where IndexReader.undeleteAll would cause +the segment's deletion count to be incorrect. (Mike McCandless) + New features 1. LUCENE-1411: Added expert API to open an IndexWriter on a prior @@ -186,10 +196,11 @@ when building transactional support on top of Lucene. (Mike McCandless) - 2. LUCENE-1382: Add an optional arbitrary String commitUserData to -IndexWriter.commit(), which is stored in the segments file and is -then retrievable via IndexReader.getCommitUserData instance and -static methods. (Shalin Shekhar Mangar via Mike McCandless) + 2. LUCENE-1382: Add an optional arbitrary Map (String - String) +commitUserData to IndexWriter.commit(), which is stored in the +segments file and is then retrievable via +IndexReader.getCommitUserData instance and static methods. +(Shalin Shekhar Mangar via Mike McCandless) 3. LUCENE-1406: Added Arabic analyzer. (Robert Muir via Grant Ingersoll) @@ -311,6 +322,10 @@ 25. LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take deletions into account when considering merges. (Yasuhiro Matsuda via Mike McCandless) + +26. LUCENE-1550: Added new n-gram based String distance measure for spell checking. +See the Javadocs for NGramDistance.java for a reference paper on why this is helpful (Tom Morton via Grant Ingersoll) + Optimizations -Yonik http://www.lucidimagination.com
[jira] Commented: (SOLR-920) Cache and reuse IndexSchema
[ https://issues.apache.org/jira/browse/SOLR-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712113#action_12712113 ] Otis Gospodnetic commented on SOLR-920: --- So if my core has its own schema.xml in the right place (in conf/schema.xml), that schema will be used, not the shard one? Cache and reuse IndexSchema --- Key: SOLR-920 URL: https://issues.apache.org/jira/browse/SOLR-920 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-920.patch if there are 1000's of cores then the cost of loading unloading schema.xml can be prohibitive similar to SOLR-919 we can also cache the DOM object of schema.xml if the location on disk is same. All the dynamic properties can be replaced lazily when they are read. We can go one step ahead in this case. Th IndexSchema object is immutable . So if there are no core properties then the same IndexSchema object can be used across all the cores -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1142) faster example schema
[ https://issues.apache.org/jira/browse/SOLR-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711814#action_12711814 ] Otis Gospodnetic commented on SOLR-1142: I'd comment-out dynamic fields and I'd leave uniqueKey as I bet 99% of users need it. faster example schema - Key: SOLR-1142 URL: https://issues.apache.org/jira/browse/SOLR-1142 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Fix For: 1.4 need faster example schema: http://www.lucidimagination.com/search/document/d46ea3fa441b6d94 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-920) Cache and reuse IndexSchema
[ https://issues.apache.org/jira/browse/SOLR-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711856#action_12711856 ] Otis Gospodnetic commented on SOLR-920: --- Looks good to me. What happens when a core has a copy of schema.xml in its conf/ dir and that schema.xml is potentially different from the shared one? Cache and reuse IndexSchema --- Key: SOLR-920 URL: https://issues.apache.org/jira/browse/SOLR-920 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul if there are 1000's of cores then the cost of loading unloading schema.xml can be prohibitive similar to SOLR-919 we can also cache the DOM object of schema.xml if the location on disk is same. All the dynamic properties can be replaced lazily when they are read. We can go one step ahead in this case. Th IndexSchema object is immutable . So if there are no core properties then the same IndexSchema object can be used across all the cores -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1147) QueryElevationComponent : updating elevate.xml through HTTP
[ https://issues.apache.org/jira/browse/SOLR-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710397#action_12710397 ] Otis Gospodnetic commented on SOLR-1147: Nicolas - I think it may make sense to edit and rename/redescribe this issue now, if you are going to make this more generic. QueryElevationComponent : updating elevate.xml through HTTP --- Key: SOLR-1147 URL: https://issues.apache.org/jira/browse/SOLR-1147 Project: Solr Issue Type: Improvement Affects Versions: 1.3, 1.4, 1.5 Environment: Any Reporter: Nicolas Pastorino Priority: Minor Attachments: QueryElevationAdministrationRequestHandler.java, QueryElevationAdministrationRequestHandler.java If one wants to update the configuration file for the QueryElevationComponent, direct edition of the file is mandatory. Currently the process seems to be : # Replace elevate.xml in Solr's dataDir # Commit. It appears that when having elevate.xml in Solr's dataDir, and solely in this case, commiting triggers a reload of elevate.xml. This does not happen when elevate.xml is stored in Solr's conf dir. As a system using Solr, i would find handy to be able to push an updated elevate.xml file/XML through HTTP, with an automatic reload of it. This would remove the currently mandatory requirement of having a direct access to the elevate.xml file, allowing more distributed architectures. This would also increase the Query Elevation system's added value by making it dynamic, configuration-wise. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1171) dynamic field name with spaces causes error
[ https://issues.apache.org/jira/browse/SOLR-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710143#action_12710143 ] Otis Gospodnetic commented on SOLR-1171: Should field names with spaces be supported? Are they supported in Lucene (ignoring the lack of support by the QP)? dynamic field name with spaces causes error --- Key: SOLR-1171 URL: https://issues.apache.org/jira/browse/SOLR-1171 Project: Solr Issue Type: Bug Reporter: Ryan McKinley Fix For: 1.4 Stumbled into this bug. I have a dynamic field meta_set_* When I add the field: meta_set_NoData Value and try to open luke, I get this exception: {panel} May 15, 2009 3:42:06 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: undefined field Value at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1132) at org.apache.solr.schema.IndexSchema.getFieldType(IndexSchema.java:1094) at org.apache.solr.search.SolrQueryParser.getRangeQuery(SolrQueryParser.java:121) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1514) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1349) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1306) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1266) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:172) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexedFieldsInfo(LukeRequestHandler.java:310) at org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:147) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) {panel} note the field is meta_set_gdal_NoData Value not Value I think the query parser is grabbing it... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1149) Make QParserPlugin and related classes extendible
[ https://issues.apache.org/jira/browse/SOLR-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709104#action_12709104 ] Otis Gospodnetic commented on SOLR-1149: It's set for release in 1.4, but subject to more review. +1 from me. Make QParserPlugin and related classes extendible - Key: SOLR-1149 URL: https://issues.apache.org/jira/browse/SOLR-1149 Project: Solr Issue Type: Improvement Components: search Reporter: Kaktu Chakarabati Fix For: 1.4 Attachments: SOLR-1149.patch In a recent attempt to create a QParserPlugin which extends DisMaxQParser/FunctionQParser functionality, it became apparent that in the current state of these classes, it is not straight forward and in fact impossible to seriously build upon the existing code. To this end, I've refactored some of the involved classes which enabled me to reuse existing logic to great results. I thought I will share these changes and comment on their nature in the hope these will make sense to other solr developers/users, and at the very least cultivate a fruitful discussion about this particular area of the solr codebase. The relevant changes are as follows: * Renamed DismaxQParser class to DisMaxQParser ( in accordance with the apparent naming convention, e.g DisMaxQParserPlugin ) * Moved DisMaxQParser to its own .java file, making it a public class rather than its previous package-private visibility. This makes it possible for users to build upon its logic, which is considerable, and to my mind is a good place to start alot of custom QParser implementations. * Changed access modifiers for the QParser abstract base class to protected (were package-private). Again as above, it makes this object usable by user-defined classes that wish to define custom QParser classes. More generally, and on the philosophy-of-code side of things, it seems misleading to define some class members as having the default access modifier (package-private) and then letting other package-scope derived classes use these while not explicitly allowing user-defined derived classes to make use of these members. In specific i'm thinking of how DisMaxQParser makes use of these members: **not because it is derived from QParser, but because it simply resides in the same namespace** * Changed access modifier for the QueryParsing.StrParser inner class and its constructors to public. Again as in above, same issue of having same-package classes enjoy the benefit of being in the same namespace (FunctionQParser.parse() uses it like so), while user defined classes cannot. Particulary in this case it is pretty bad since this class advertises itself as a collection of utilities for query parsing in general - great resource, should probably even live elsewhere (common.utils?) * Changed Function.FunctionWeight inner class data member modifiers to protected (were default - package-private). This allowed me to inherit from FunctionQuery as well as make use of its original FunctionWeight inner class while overriding some of the latter's methods. This is in the same spirit of the changes above. Please also note this follows the common Query/Weight implementation pattern in the lucene codebase, see for example the BooleanQuery/BooleanWeight code. All in all these are relatively minor changes which unlock a great deal of functionality to 3rd party developers, which i think is ultimately a big part of what solr is all about - extendability. It is also perhaps a cue for a more serious refactoring of the QParserPlugin hierarchy, although i will leave such bold exclamations to another occasion. Attached is a patch file, having passed the usual coding-style/unit testing cycle. -Chak -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1147) QueryElevationComponent : updating elevate.xml through HTTP
[ https://issues.apache.org/jira/browse/SOLR-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709241#action_12709241 ] Otis Gospodnetic commented on SOLR-1147: I'm +1 for the idea of being able to push elevate config (and really other config files, too!) from a remote system into Solr. I only skimmed the patch. It would be good to add a unit test. Could you do that? You'll also want to add the ASL on top of the source code. It may also be good to remove eZ publish references from the Javadoc (having that in the javadoc doesn't really help developers using Solr) Is that at the end of QueryElevationAdministrationRequestHandler.class + really needed? Please note the bit about the code formatting here: http://wiki.apache.org/solr/HowToContribute#head-59ae13df098fbdcc46abdf980aa8ee76d3ee2e3b Thanks! QueryElevationComponent : updating elevate.xml through HTTP --- Key: SOLR-1147 URL: https://issues.apache.org/jira/browse/SOLR-1147 Project: Solr Issue Type: Improvement Affects Versions: 1.3, 1.4, 1.5 Environment: Any Reporter: Nicolas Pastorino Priority: Minor Attachments: QueryElevationAdministrationRequestHandler.java, QueryElevationAdministrationRequestHandler.java If one wants to update the configuration file for the QueryElevationComponent, direct edition of the file is mandatory. Currently the process seems to be : # Replace elevate.xml in Solr's dataDir # Commit. It appears that when having elevate.xml in Solr's dataDir, and solely in this case, commiting triggers a reload of elevate.xml. This does not happen when elevate.xml is stored in Solr's conf dir. As a system using Solr, i would find handy to be able to push an updated elevate.xml file/XML through HTTP, with an automatic reload of it. This would remove the currently mandatory requirement of having a direct access to the elevate.xml file, allowing more distributed architectures. This would also increase the Query Elevation system's added value by making it dynamic, configuration-wise. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer
[ https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702434#action_12702434 ] Otis Gospodnetic commented on SOLR-822: --- Todd's comment from Oct 23, 2008 caught my attention: {quote} It should also work for existing filters like LowerCase. Seems like it has the potential to be faster then the filters, as it doesn't have to perform the same replacement multiple times if a particular character is replicated into multiple tokens, like in NGramTokenizer or CJKTokenizer. {quote} Couldn't we replace LowerCaseFilter then? Or does LCF still have some unique value? Ah, it does - it makes it possible to put it *after* something like WordDelimiterFilterFactory. Lowercasing at the very beginning would make it impossible for WDFF to do its job. Never mind. Leaving for posterity. CharFilter - normalize characters before tokenizer -- Key: SOLR-822 URL: https://issues.apache.org/jira/browse/SOLR-822 Project: Solr Issue Type: New Feature Components: Analysis Affects Versions: 1.3 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.4 Attachments: character-normalization.JPG, sample_mapping_ja.txt, sample_mapping_ja.txt, SOLR-822-for-1.3.patch, SOLR-822-renameMethod.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch A new plugin which can be placed in front of tokenizer/. {code:xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping_ja.txt / tokenizer class=solr.MappingCJKTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType {code} charFilter/ can be multiple (chained). I'll post a JPEG file to show character normalization sample soon. MOTIVATION: In Japan, there are two types of tokenizers -- N-gram (CJKTokenizer) and Morphological Analyzer. When we use morphological analyzer, because the analyzer uses Japanese dictionary to detect terms, we need to normalize characters before tokenization. I'll post a patch soon, too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-633) QParser for use with user-entered query which recognizes subphrases as well as allowing some other customizations on per field basis
[ https://issues.apache.org/jira/browse/SOLR-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700422#action_12700422 ] Otis Gospodnetic commented on SOLR-633: --- This description could sure use an example! :) I read it 3 times and still don't have a good picture of what this is really about. QParser for use with user-entered query which recognizes subphrases as well as allowing some other customizations on per field basis Key: SOLR-633 URL: https://issues.apache.org/jira/browse/SOLR-633 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Environment: All Reporter: Preetam Rao Priority: Minor Fix For: 1.5 Create a request handler (actually a QParser) for use with user entered queries with following features- a) Take a user query string and try to match it against multiple fields, while recognizing sub-phrase matches. b) For each field give the below parameters: 1) phraseBoost - the factor which decides how good a n token sub phrase match is compared to n-1 token sub-phrase match. 2) maxScoreOnly - If there are multiple sub-phrase matches pick, only the highest 3) ignoreDuplicates - If the same sub-phrase query matches multiple times, pick only one. 4) disableOtherScoreFactors - Ignore tf, query norm, idf and any other parameters which are not relevant. c) Try to provide all the parameters similar to dismax. Reuse or extend dismax. Other suggestions and feedback appreciated :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Contributing Translations
I like the multilingualness is general... but in this case I think Grant is correct about non-primary language docs getting outdated quickly. It's hard to keep even just English docs up to date! And stale, incorrect docs are worse than no docs. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll gsing...@apache.org To: solr-dev@lucene.apache.org Sent: Monday, April 13, 2009 3:11:33 PM Subject: Re: Contributing Translations First off, let me say that I would love to see translations of Solr docs. My main concern is one of maintainability. If we agree to commit translations, then we as committers need to be able to maintain them as well. I am not sure which is worse, no translations or out of date translations. Say, for example, that I make a patch that changes how the spell checker works in Solr. As an English speaker, I can easily update the English docs as part of my patch, but I wouldn't even know where to begin with, say, Swahili (picking a language I feel safe saying that none of our committers speak for an example, not b/c anyone is proposing a Swahili translation). So, now, it is up to the community to fix that documentation. Which, is, of course, fine, except I'd venture to say most committers wouldn't even be in the position to know whether the patch is good, so we'd have to take it on faith. Committing on faith isn't usually a good thing. We should look into how other Apache projects handle it before committing to saying we are going to support other languages. I can ask over on commun...@apache.org if people would like. On Apr 9, 2009, at 10:40 PM, Green Crescent Translations wrote: Hello, I'm a project manager for Green Crescent Translations and I'm always looking to assist the open source community by providing translations of web sites, manuals, user interfaces and such. If you're interested, please let us know. We'd be happy to translate you web site documentation into needed languages. Just let me know which languages and what texts are essential and we'd be happy to help. Many thanks, Jonathan
[jira] Commented: (SOLR-634) Solr user interface
[ https://issues.apache.org/jira/browse/SOLR-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695093#action_12695093 ] Otis Gospodnetic commented on SOLR-634: --- Lars, how come you opted to use HTTPClient directly instead of using SolrJ? (I see no mention of solrj in the manual either). Or perhaps you have a SolrAdapter version that uses SolrJ by now? Thanks. Solr user interface --- Key: SOLR-634 URL: https://issues.apache.org/jira/browse/SOLR-634 Project: Solr Issue Type: New Feature Components: web gui Reporter: Lars Kotthoff Attachments: SOLR-634.patch, solr-ui.tar.gz Provide an example user interface for Solr (web application) for people to try out Solr's capabilities. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: replication (request handler) Qtime goes mad?
Could you please re-send your message to solr-user instead? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: sunnyfr johanna...@gmail.com To: solr-dev@lucene.apache.org Sent: Thursday, March 26, 2009 9:29:40 AM Subject: replication (request handler) Qtime goes mad? Hi, Just applied replication by requestHandler. And since this the Qtime went mad and can reach long time name=QTime9068 Without this replication Qtime can be around 1sec. I've 14Mdocs stores for 11G. so not a lot of data stores. I've servers with 8G and tomcat use 7G. I'm updating every 30mn which is about 50 000docs. Have a look as well at my cpu which are aswell quite full ? Have you an idea? Do I miss a patch ? Thanks a lot, Solr Specification Version: 1.3.0.2009.01.22.13.51.22 Solr Implementation Version: 1.4-dev exported - root - 2009-01-22 13:51:22 http://www.nabble.com/file/p22722028/cpu_.jpg cpu_.jpg -- View this message in context: http://www.nabble.com/replication-%28request-handler%29-Qtime-goes-mad--tp22722028p22722028.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Re: LinkedIn open source project: kamikaze/lucene-ext
Hi, At which point would you say the number of cached bitsets should be considered excessive? Simply a function of bitset size (index size) and memory/JVM heap? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jason Rutherglen jason.rutherg...@gmail.com To: solr-dev@lucene.apache.org Sent: Tuesday, March 24, 2009 2:48:25 PM Subject: Re: LinkedIn open source project: kamikaze/lucene-ext http://bobo-browse.wiki.sourceforge.net/ For faceting, the Bobo library from LinkedIn may be useful in cases where the number of cached bitsets is excessive. On Sun, Mar 22, 2009 at 8:35 PM, Lance Norskog wrote: LinkedIn open-sourced a pile of DocSet compression implementations as Lucene-Ext, or kamikaze: http://code.google.com/p/lucene-ext/wiki/Kamikaze Has anyone looked at using these in Solr? -- Lance Norskog goks...@gmail.com 650-922-8831 (US)
[jira] Commented: (SOLR-1079) Rename omitTf to omitTermFreqAndPositions
[ https://issues.apache.org/jira/browse/SOLR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688382#action_12688382 ] Otis Gospodnetic commented on SOLR-1079: I agree with shorter is better... though we should avoid cryptic or misleading. I think omitTf is misleading. I'd rather we think of something that's maybe less descriptive (since one will need to look at the docs anyway), but not misleading (making the person think looking at the docs is not necessary) maybe omitTermsomething? Info? That would sort of match Lucene's TermInfo object (which doesn't encompass Payloads, though). Rename omitTf to omitTermFreqAndPositions - Key: SOLR-1079 URL: https://issues.apache.org/jira/browse/SOLR-1079 Project: Solr Issue Type: Improvement Components: documentation, update Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.4 LUCENE-1561 has renamed omitTf. See http://www.lucidimagination.com/search/document/376c1c12dd464164/lucene_1561_and_omittf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: JCache API and EHCache
Want to open a JIRA issue (Enhancement?) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: KaktuChakarabati jimmoe...@gmail.com To: solr-dev@lucene.apache.org Sent: Monday, March 23, 2009 3:12:16 PM Subject: JCache API and EHCache Hey, What do you guys think about overhauling the caching layer to be compliant with the upcoming Jcache api? (jsr-107) In specific, I've been experimenting some with ehcache (http://ehcache.sourceforge.net/ , Apache OS license) and it seems to be a very comprehensive implementation, as well as fully compliant with the API. I think the benefits are numerous: in respect to ehcache itself, it seems to be a very mature implementation, supporting most classical cache schemes as well as some interesting distributed cache options (and of course, performance-wise its very lucrative in terms of reported multi-cpu scaling performance and some of the benchmark figures they show). Further, abstracting away the caches to use the jcache api would probably make it easier in the future to make the whole caching layer more easily swappable with some other implementations that will probably crop up. Maybe for the 1.5 roadmap? just a thought... Chak -- View this message in context: http://www.nabble.com/JCache-API-and-EHCache-tp22667097p22667097.html Sent from the Solr - Dev mailing list archive at Nabble.com.
[jira] Commented: (SOLR-1065) Add a ContentStreamDataSource to DIH to accept post data
[ https://issues.apache.org/jira/browse/SOLR-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683502#action_12683502 ] Otis Gospodnetic commented on SOLR-1065: {quote} regular update handler can only handle xml in the standard format. With DIH you can post any xml or any other file . Moreover DIH lets you have custom transformations to the data. It is also possible to mix the uploaded data with other DatSources (DB) before creating the documents {quote} Is there a reason why this can't be added to the core update handler? Add a ContentStreamDataSource to DIH to accept post data Key: SOLR-1065 URL: https://issues.apache.org/jira/browse/SOLR-1065 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: SOLR-1065.patch, SOLR-1065.patch, SOLR-1065.patch It is a common requirement to push data to DIH. Currently it is not possible . If we have a ContentStreamDataSource it can easily solve this problem sample configuration {code:xml} dataSource type=ContentStreamDataSource/ {code} This datasource does not need any extra configuration. Make a normal POST request with the data as the body. The params remain same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: faster example schema
+1 I think we could try just comment out the kitchen sick portions and avoid maintaining 2 config files. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-dev@lucene.apache.org; yo...@lucidimagination.com Sent: Sunday, March 8, 2009 1:20:50 AM Subject: Re: faster example schema +1 perhaps schema.xml and schema-example.xml? On Sat, Mar 7, 2009 at 8:42 PM, Yonik Seeley wrote: I've occasionally run across people going with another search engine because it was faster at indexing. The example schema that people may be using as a base to do their benchmarking (with perhaps minimal modifications) is slow. There are many people out there that check what's fastest first, and *then* check if it is satisfactory to meet their needs in other areas. With very simple synthetic test documents (just a few fields each) and the CSV loader, I've personally seen the indexing rate go from ~330/sec to ~3000/sec, when I removed the default field values, term vectors, copyFields, etc. The default example schema should still be able to show how something can be done, but that doesn't mean it needs to be enabled by default. So what do people think about speeding up the default/example schema before 1.4? -Yonik http://www.lucidimagination.com -- Regards, Shalin Shekhar Mangar.
[jira] Updated: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory
[ https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-346: -- Fix Version/s: (was: 1.3.1) 1.4 need to improve snapinstaller to ignore non-snapshots in data directory --- Key: SOLR-346 URL: https://issues.apache.org/jira/browse/SOLR-346 Project: Solr Issue Type: Improvement Components: replication (scripts) Affects Versions: 1.2, 1.3 Reporter: Bill Au Assignee: Bill Au Priority: Minor Fix For: 1.4 Attachments: solr-346.patch http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already installed A directory in the Solr data directory is causing snapinstaller to fail. Snapinstaller should be improved to ignore any much non-snapshot as possible. It can use a regular expression to look for snapshot.dd where d is a digit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory
[ https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic reopened SOLR-346: --- need to improve snapinstaller to ignore non-snapshots in data directory --- Key: SOLR-346 URL: https://issues.apache.org/jira/browse/SOLR-346 Project: Solr Issue Type: Improvement Components: replication (scripts) Affects Versions: 1.2, 1.3 Reporter: Bill Au Assignee: Bill Au Priority: Minor Fix For: 1.4 Attachments: solr-346.patch http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already installed A directory in the Solr data directory is causing snapinstaller to fail. Snapinstaller should be improved to ignore any much non-snapshot as possible. It can use a regular expression to look for snapshot.dd where d is a digit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: An ExternalIndexField implementation with multicore
Isn't Lucene's ParallelReader meant to address such use cases? Don't ask me for details, the actual use of PR always seemed a bit fuzzy to me because of its requirement to keep docIDs in sync. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com To: solr-dev@lucene.apache.org Sent: Thursday, February 19, 2009 8:12:28 PM Subject: An ExternalIndexField implementation with multicore hi, Just the way we have an ExternalFileField is it possible to refer to a field (ExternalIndexField) in another index ( which lives in another core)? I would not want to search on that field but I may wish to use it to filter or sort or as a ValueSource in a Function The usecase is as follows. -- I have a large index with huge docs which changes less frequently (think of a mailbox). The user may arbitrarily apply/remove tags on that. but I may not wish to reindex the mails where the tags are applied. I want to just add a small doc mail-unique_id and the tag into another index in another core. When I query, I wish to apply a filter of the label or when i retrieve the mail details I want to get the tags (stored field) applied to that. Another one. I have a huge index of products which the users can vote up or down (say popularity). I may want to add the add the popularity of the item into another index and when I query I wish to sort by the popularity. the commits on the other external index will be more frequent than the main index. What are the challenges in implementing something like this? I wish to raise a Jira issue if it looks feasible -- --Noble Paul
[jira] Updated: (SOLR-952) duplicated code in (Default)SolrHighlighter and HighlightingUtils
[ https://issues.apache.org/jira/browse/SOLR-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-952: -- Description: A large quantity of code is duplicated between the deprecated HighlightingUtils class and the newer SolrHighlighter and DefaultSolrHighlighter (which have been getting bug fixes and enhancements). The Utils class is no longer used anywhere in Solr, but people writing plugins may be taking advantage of it, so it should be cleaned up. (was: A large quantity of code is duplicated between the deprecated HighlightingUtils class and the newer SolrHighlighter and DefaultSolrHighlighter (which have been getting bug fixes and enhancements). The Utils class is no longer used anywhere in Solr, but people writing plugins may be taking advantage of it, so it should be cleaned up.) Fix Version/s: 1.4 duplicated code in (Default)SolrHighlighter and HighlightingUtils - Key: SOLR-952 URL: https://issues.apache.org/jira/browse/SOLR-952 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.4 Reporter: Chris Harris Priority: Minor Fix For: 1.4 Attachments: SOLR-952.patch A large quantity of code is duplicated between the deprecated HighlightingUtils class and the newer SolrHighlighter and DefaultSolrHighlighter (which have been getting bug fixes and enhancements). The Utils class is no longer used anywhere in Solr, but people writing plugins may be taking advantage of it, so it should be cleaned up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1022) suggest multiValued for ignored field
[ https://issues.apache.org/jira/browse/SOLR-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674507#action_12674507 ] Otis Gospodnetic commented on SOLR-1022: I must have missed some releated thread on the ML... but can you explan what you mean by an unmatched multi-valued field? And what does ignored field mean? Thanks. suggest multiValued for ignored field - Key: SOLR-1022 URL: https://issues.apache.org/jira/browse/SOLR-1022 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Environment: Mac OS 10.5 java 1.5 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1022.patch Original Estimate: 1h Remaining Estimate: 1h We are actually using the suggested ignored field in the schema. I have found, however, that Solr still throws a error 400 if I send in an unmatched multi-valued field. It seems that if I set this ignored field to be multiValued than a document with unrecognized single or multiple value fields is sucessfully indexed. Attached patch alters this suggested item in the schema. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-670) UpdateHandler must provide a rollback feature
[ https://issues.apache.org/jira/browse/SOLR-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672441#action_12672441 ] Otis Gospodnetic commented on SOLR-670: --- Is it possible that the new rollback causes the IndexWriter to be closed on error, which then causes the following error next time you try to add a (valid) document? Feb 10, 2009 5:46:28 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 1 Feb 10, 2009 5:46:28 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:397) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:402) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2108) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:218) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) After rollback is invoked, is one supposed to execute some other command to get Solr in a healthy state? UpdateHandler must provide a rollback feature - Key: SOLR-670 URL: https://issues.apache.org/jira/browse/SOLR-670 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: SOLR-670.patch, SOLR-670.patch, SOLR-670.patch Lucene IndexWriter already has a rollback method. There should be a counterpart for the same in _UpdateHandler_ so that users can do a rollback over http -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-670) UpdateHandler must provide a rollback feature
[ https://issues.apache.org/jira/browse/SOLR-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672523#action_12672523 ] Otis Gospodnetic commented on SOLR-670: --- That was with Solr trunk (svn up-ed right before trying). I did not call commit after rollback when that happened, though I *think* I tried adding commit, too, and that didn't do anything either. UpdateHandler must provide a rollback feature - Key: SOLR-670 URL: https://issues.apache.org/jira/browse/SOLR-670 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: SOLR-670.patch, SOLR-670.patch, SOLR-670.patch Lucene IndexWriter already has a rollback method. There should be a counterpart for the same in _UpdateHandler_ so that users can do a rollback over http -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1005) DoubleMetaphone Filter Produces NullpointerException on zero-length token
[ https://issues.apache.org/jira/browse/SOLR-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-1005. Resolution: Fixed Assignee: Otis Gospodnetic Thanks Michael. Committed revision 741721. DoubleMetaphone Filter Produces NullpointerException on zero-length token - Key: SOLR-1005 URL: https://issues.apache.org/jira/browse/SOLR-1005 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.4 Environment: jdk 1.6.10, tomcat 6.x Reporter: Michael Henson Assignee: Otis Gospodnetic Attachments: solr-1005.zip If any token given to the DoubleMetaphoneFilter is empty (Token exists, 0 length), then the encoder will return null instead of a metaphone encoded string. The current code assumes that there will always be a valid object returned. Proposed solution: Make sure 0-length tokens are skipped at the top branch where the code checks whether or not we have a Token object at all. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [Solr Wiki] Update of LBHttpSolrServer by OtisGospodnetic
I'd simply address that first. I feel that's the first question people will ask (themselves). But, sorry for the interruption. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com To: solr-dev@lucene.apache.org Sent: Tuesday, February 3, 2009 12:50:24 PM Subject: Re: [Solr Wiki] Update of LBHttpSolrServer by OtisGospodnetic isn't this the same as the When to use this? section ? why do we need a separate section? On Tue, Feb 3, 2009 at 9:50 PM, Apache Wiki wrote: Dear Wiki user, You have subscribed to a wiki page or wiki category on Solr Wiki for change notification. The following page has been changed by OtisGospodnetic: http://wiki.apache.org/solr/LBHttpSolrServer -- == What is LBHttpSolrServer? == - LB!HttpSolrServer or !LoadBalanced !HttpSolrServer is just a wrapper to !CommonsHttpSolrServer. This is useful when you have multiple !SolrServers and the requests need to be Load Balanced among them. it offers automatic failover when a server goes down and it detects when the server comes back up + LB!HttpSolrServer or !LoadBalanced !HttpSolrServer is just a wrapper to !CommonsHttpSolrServer. This is useful when you have multiple !SolrServers and the requests need to be Load Balanced among them. it offers automatic failover when a server goes down and it detects when the server comes back up. + + TODO: address Why would I use LBHttpSolrServer instead of existing hw/sf LB-s. == How to use? == {{{ -- --Noble Paul
[jira] Commented: (SOLR-844) A SolrServer impl to front-end multiple urls
[ https://issues.apache.org/jira/browse/SOLR-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12670042#action_12670042 ] Otis Gospodnetic commented on SOLR-844: --- Good comment from Wunder's made on the ML: {quote} This would be useful if there was search-specific balancing, like always send the same query back to the same server. That can make your cache far more effective. wunder {quote} A SolrServer impl to front-end multiple urls Key: SOLR-844 URL: https://issues.apache.org/jira/browse/SOLR-844 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: SOLR-844.patch, SOLR-844.patch, SOLR-844.patch, SOLR-844.patch Currently a {{CommonsHttpSolrServer}} can talk to only one server. This demands that the user have a LoadBalancer or do the roundrobin on their own. We must have a {{LBHttpSolrServer}} which must automatically do a Loadbalancing between multiple hosts. This can be backed by the {{CommonsHttpSolrServer}} This can have the following other features * Automatic failover * Optionally take in a file /url containing the the urls of servers so that the server list can be automatically updated by periodically loading the config * Support for adding removing servers during runtime * Pluggable Loadbalancing mechanism. (round-robin, weighted round-robin, random etc) * Pluggable Failover mechanisms -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-844) A SolrServer impl to front-end multiple urls
[ https://issues.apache.org/jira/browse/SOLR-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666296#action_12666296 ] Otis Gospodnetic commented on SOLR-844: --- I'm not sure there is a clear consensus about this functionality being a good thing. Perhaps we can get more people's opinions? A SolrServer impl to front-end multiple urls Key: SOLR-844 URL: https://issues.apache.org/jira/browse/SOLR-844 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: SOLR-844.patch, SOLR-844.patch, SOLR-844.patch Currently a {{CommonsHttpSolrServer}} can talk to only one server. This demands that the user have a LoadBalancer or do the roundrobin on their own. We must have a {{LBHttpSolrServer}} which must automatically do a Loadbalancing between multiple hosts. This can be backed by the {{CommonsHttpSolrServer}} This can have the following other features * Automatic failover * Optionally take in a file /url containing the the urls of servers so that the server list can be automatically updated by periodically loading the config * Support for adding removing servers during runtime * Pluggable Loadbalancing mechanism. (round-robin, weighted round-robin, random etc) * Pluggable Failover mechanisms -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-844) A SolrServer impl to front-end multiple urls
[ https://issues.apache.org/jira/browse/SOLR-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666296#action_12666296 ] otis edited comment on SOLR-844 at 1/22/09 1:12 PM: I'm not sure there is a clear consensus about this functionality being a good thing (also 0 votes). Perhaps we can get more people's opinions? was (Author: otis): I'm not sure there is a clear consensus about this functionality being a good thing. Perhaps we can get more people's opinions? A SolrServer impl to front-end multiple urls Key: SOLR-844 URL: https://issues.apache.org/jira/browse/SOLR-844 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: SOLR-844.patch, SOLR-844.patch, SOLR-844.patch Currently a {{CommonsHttpSolrServer}} can talk to only one server. This demands that the user have a LoadBalancer or do the roundrobin on their own. We must have a {{LBHttpSolrServer}} which must automatically do a Loadbalancing between multiple hosts. This can be backed by the {{CommonsHttpSolrServer}} This can have the following other features * Automatic failover * Optionally take in a file /url containing the the urls of servers so that the server list can be automatically updated by periodically loading the config * Support for adding removing servers during runtime * Pluggable Loadbalancing mechanism. (round-robin, weighted round-robin, random etc) * Pluggable Failover mechanisms -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-960) CommonsHttpSolrServer - documentation - phase II (Addition of log in setMaxRetries as a warning for out of range input)
[ https://issues.apache.org/jira/browse/SOLR-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-960. --- Resolution: Fixed I'll assume the int-long is fine. Sending src/solrj/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java Transmitting file data . Committed revision 734606. CommonsHttpSolrServer - documentation - phase II (Addition of log in setMaxRetries as a warning for out of range input) - Key: SOLR-960 URL: https://issues.apache.org/jira/browse/SOLR-960 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Kay Kay Priority: Minor Fix For: 1.4 Attachments: SOLR-960.patch Original Estimate: 1h Remaining Estimate: 1h Add javadoc for : CommonsHttpSolrServer#AGENT CommonsHttpSolrServer#_invariantParams CommonsHttpSolrServer#_followRedirects CommonsHttpSolrServer#_allowCompression , _maxRetries #setConnectionTimeout, #setSoTimeout #setConnectionManagerTimeout(int) deprecated in favor of #setConnectionManagerTimeout(long) with the same API as in HttpClient 3.1 . #setMaxRetries - there would be a warning in the log message if the maximum retries were 1 to keep the programmer explicitly aware of the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-849) Add bwlimit support to snappuller
[ https://issues.apache.org/jira/browse/SOLR-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-849. --- Resolution: Won't Fix No need for this since we are moving away from shell script-based replication, most likely. Add bwlimit support to snappuller - Key: SOLR-849 URL: https://issues.apache.org/jira/browse/SOLR-849 Project: Solr Issue Type: Improvement Components: replication (scripts) Reporter: Otis Gospodnetic Priority: Minor Attachments: SOLR-849.patch From http://markmail.org/message/njnbh5gbb2mvfe24 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-849) Add bwlimit support to snappuller
[ https://issues.apache.org/jira/browse/SOLR-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-849: -- Assignee: Otis Gospodnetic Add bwlimit support to snappuller - Key: SOLR-849 URL: https://issues.apache.org/jira/browse/SOLR-849 Project: Solr Issue Type: Improvement Components: replication (scripts) Reporter: Otis Gospodnetic Assignee: Otis Gospodnetic Priority: Minor Attachments: SOLR-849.patch From http://markmail.org/message/njnbh5gbb2mvfe24 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-958) CommonsHttpSolrServer - documentation ..
[ https://issues.apache.org/jira/browse/SOLR-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-958. --- Resolution: Fixed Assignee: Otis Gospodnetic Thanks! Sending src/solrj/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java Transmitting file data . Committed revision 734326. CommonsHttpSolrServer - documentation .. - Key: SOLR-958 URL: https://issues.apache.org/jira/browse/SOLR-958 Project: Solr Issue Type: Bug Components: documentation Reporter: Kay Kay Assignee: Otis Gospodnetic Priority: Minor Fix For: 1.4 Attachments: SOLR-958.patch Original Estimate: 0.17h Remaining Estimate: 0.17h clarification about ResponseParser member , useMultiPartPost -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-957) CommonParams#VERSION : Inconsistent doc
[ https://issues.apache.org/jira/browse/SOLR-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-957. --- Resolution: Fixed Thanks Kay. Sendingsrc/common/org/apache/solr/common/params/CommonParams.java Transmitting file data . Committed revision 734329. CommonParams#VERSION : Inconsistent doc Key: SOLR-957 URL: https://issues.apache.org/jira/browse/SOLR-957 Project: Solr Issue Type: Bug Components: documentation Reporter: Kay Kay Priority: Minor Fix For: 1.4 Attachments: SOLR-957.patch Original Estimate: 1h Remaining Estimate: 1h The doc for VERSION (in CommonParams) seems to be copied from the previous field. (totally unrelated ). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-956) SolrParams#getFieldInt(String, String) - inconsistent documentation
[ https://issues.apache.org/jira/browse/SOLR-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-956. --- Resolution: Fixed Assignee: Otis Gospodnetic Thanks Kay. Sendingsrc/common/org/apache/solr/common/params/SolrParams.java Transmitting file data . Committed revision 734330. SolrParams#getFieldInt(String, String) - inconsistent documentation - Key: SOLR-956 URL: https://issues.apache.org/jira/browse/SOLR-956 Project: Solr Issue Type: Improvement Components: documentation Reporter: Kay Kay Assignee: Otis Gospodnetic Fix For: 1.4 Attachments: SOLR-956.patch Original Estimate: 1h Remaining Estimate: 1h SolrParams#getFieldInt(String, String) documentation says it returns def. if the value does not exist. There is no def. passed on to the method - so seems to be inconsistent with what the method does. It returns null if the field,param does not exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-954) SolrQuery - better cross-referential documentation / fix inconsistent cross-reference links .
[ https://issues.apache.org/jira/browse/SOLR-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-954. --- Resolution: Fixed Assignee: Otis Gospodnetic Thanks Kay. Sendingsrc/solrj/org/apache/solr/client/solrj/SolrQuery.java Transmitting file data . Committed revision 734332. SolrQuery - better cross-referential documentation / fix inconsistent cross-reference links . - Key: SOLR-954 URL: https://issues.apache.org/jira/browse/SOLR-954 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 1.3 Environment: Tomcat 6, Java 6 Reporter: Kay Kay Assignee: Otis Gospodnetic Priority: Minor Fix For: 1.4 Attachments: SOLR-954.patch, SOLR-954.patch Original Estimate: 3h Remaining Estimate: 3h SolrQuery methods need quite a bit of documentation as the javadoc appears to be blank at the moment and comments for some deprecated methods point to non-existent methods. Patch relevant to documentation available herewith. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-953) Small simplification for LuceneGapFragmenter.isNewFragment
[ https://issues.apache.org/jira/browse/SOLR-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-953. --- Resolution: Fixed Assignee: Otis Gospodnetic Thanks Chris. Sendingsrc/java/org/apache/solr/highlight/GapFragmenter.java Transmitting file data . Committed revision 734336. Small simplification for LuceneGapFragmenter.isNewFragment -- Key: SOLR-953 URL: https://issues.apache.org/jira/browse/SOLR-953 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 1.4 Reporter: Chris Harris Assignee: Otis Gospodnetic Priority: Minor Attachments: SOLR-953.patch This little patch makes the code for LuceneGapFragmenter.isNewFragment(Token) slightly more intuitive. The method currently features the line {code} fragOffsetAccum += token.endOffset() - fragOffsetAccum; {code} This can be simplified, though, to just {code} fragOffsetAccum = token.endOffset(); {code} Maybe it's just me, but I find the latter expression's intent to be sufficiently clearer than the former to warrant committing such a change. This patch makes this simplification. Also, if you do make this simplification, then it doesn't really make sense to think of fragOffsetAccum as an accumulator anymore, so in the patch we rename the variable to just fragOffset. Tests from HighlighterTest.java pass with the patch applied. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index
[ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662626#action_12662626 ] Otis Gospodnetic commented on SOLR-308: --- Lance - anyone can add/modify a Wiki page. Do you mind adding info about this field type? Add a field that generates an unique id when you have none in your data to index Key: SOLR-308 URL: https://issues.apache.org/jira/browse/SOLR-308 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Assignee: Hoss Man Priority: Minor Fix For: 1.3 Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch This patch adds a field that generates an unique id when you have no unique id in your data you want to index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: error: exceeded limit of maxWarmingSearcher
Doesn't that mean that you are doing something that causes searchers to warm up (e.g. running snap* scripts or your new replication equivalent) and doing that so frequently that when you do this for the third time the first two searchers are still warming up? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com To: solr-dev@lucene.apache.org Sent: Monday, January 5, 2009 11:10:06 PM Subject: error: exceeded limit of maxWarmingSearcher I have implemented the javabin update functionality (SOLR-8965) and the LargeVolumeJettytestcase is failing with the following message. exceeded limit of maxWarmingSearchers=2, try again later. Jan 5, 2009 5:44:40 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 15 Jan 5, 2009 5:44:40 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1050) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:78) at org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:95) Can anyone point me to what I may be doing wrong? -- --Noble Paul
Re: [jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.
Quick clarifications: - Droids: http://incubator.apache.org/droids/index.html - DIH: http://wiki.apache.org/solr/DataImportHandler - Solr + Tika: http://wiki.apache.org/solr/ExtractingRequestHandler Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ben Johnson ben.john...@jandpconsulting.co.uk To: solr-dev@lucene.apache.org Sent: Thursday, January 1, 2009 6:00:43 PM Subject: Re: [jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH. I'm watching this issue with interest, but I'm having trouble understanding the bigger picture. I am prototyping a system that uses Restlet to store and index objects (mainly MS Office and OpenOffice documents and emails), so I am planning to use Solr with Tika to index the objects. I know nothing about DIH (Distributed Index Handler?), so I'm not sure what role it plays with Solr. Is it a vendor-specific technology (from Autonomy)? What does it do? Do you give it objects to index and it handles them by passing it to one or more Solr/Tika indexing servers? And are you thinking that this would therefore be a good place to not only index the objects, but also pass the information about the digital content to DROID? Reading a bit about DROID (from TNA, The National Archives), it seems like it is used to capture information about the digital content of objects stored in a content repository. How does this fit with Solr? I thought Solr with Tika just did the indexing of text-based objects, but the actual storage of the objects would be elsewhere (probably in the file system). From what I can tell, DROID would operate on the file system objects, not the indexing information. Have I got this right? Ideally, I would also like to convert any suitable content into PDF/A format for long-term archival - probably not relevant to this issue, but I thought I'd mention it in case you see an application of this as part of email and attachment storage. Sorry for all the questions, but hopefully someone could clarify this for me! Thanks very much Ben Johnson -- From: Grant Ingersoll (JIRA) Sent: Thursday, January 01, 2009 7:07 PM To: Subject: [jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH. [ https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12660210#action_12660210 ] Grant Ingersoll commented on SOLR-934: -- Would it make more sense for DIH to farm out it's content acquisition to a library like Droids? Then, we could have real crawling, etc. all through a pluggable connector framework. Enable importing of mails into a solr index through DIH. Key: SOLR-934 URL: https://issues.apache.org/jira/browse/SOLR-934 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Preetam Rao Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: SOLR-934.patch, SOLR-934.patch Original Estimate: 24h Remaining Estimate: 24h Enable importing of mails into solr through DIH. Take one or more mailbox credentials, download and index their content along with the content from attachments. The folders to fetch can be made configurable based on various criteria. Apache Tika is used for extracting content from different kinds of attachments. JavaMail is used for mail box related operations like fetching mails, filtering them etc. The basic configuration for one mail box is as below: {code:xml} password=something host=imap.gmail.com protocol=imaps/ {code} The below is the list of all configuration available: {color:green}Required{color} - *user* *pwd* *protocol* (only imaps supported now) *host* {color:green}Optional{color} - *folders* - comma seperated list of folders. If not specified, default folder is used. Nested folders can be specified like a/b/c *recurse* - index subfolders. Defaults to true. *exclude* - comma seperated list of patterns. *include* - comma seperated list of patterns. *batchSize* - mails to fetch at once in a given folder. Only headers can be prefetched in Javamail IMAP. *readTimeout* - defaults to 6ms *conectTimeout* - defaults to 3ms *fetchSize* - IMAP config. 32KB default *fetchMailsSince* - date/time in miliiseconds, mails received after which will be fetched. Useful for delta import. *customFilter* - class name. {code} import javax.mail.Folder; import javax.mail.SearchTerm; clz