[jira] Commented: (SOLR-1375) BloomFilter on a field
[ https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851637#action_12851637 ] Jason Rutherglen commented on SOLR-1375: {quote}Doesn't this hint at some of this stuff (haven't looked at the patch) really needing to live in Lucene index segment files merging land?{quote} Adding this to Lucene is out of the scope of what I require, however I don't have time unless it's going to be committed. BloomFilter on a field -- Key: SOLR-1375 URL: https://issues.apache.org/jira/browse/SOLR-1375 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch Original Estimate: 120h Remaining Estimate: 120h * A bloom filter is a read only probabilistic set. Its useful for verifying a key exists in a set, though it returns false positives. http://en.wikipedia.org/wiki/Bloom_filter * The use case is indexing in Hadoop and checking for duplicates against a Solr cluster (which when using term dictionary or a query) is too slow and exceeds the time consumed for indexing. When a match is found, the host, segment, and term are returned. If the same term is found on multiple servers, multiple results are returned by the distributed process. (We'll need to add in the core name I just realized). * When new segments are created, and commit is called, a new bloom filter is generated from a given field (default:id) by iterating over the term dictionary values. There's a bloom filter file per segment, which is managed on each Solr shard. When segments are merged away, their corresponding .blm files is also removed. In a future version we'll have a central server for the bloom filters so we're not abusing the thread pool of the Solr proxy and the networking of the Solr cluster (this will be done sooner than later after testing this version). I held off because the central server requires syncing the Solr servers' files (which is like reverse replication). * The patch uses the BloomFilter from Hadoop 0.20. I want to jar up only the necessary classes so we don't have a giant Hadoop jar in lib. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html * Distributed code is added and seems to work, I extended TestDistributedSearch to test over multiple HTTP servers. I chose this approach rather than the manual method used by (for example) TermVectorComponent.testDistributed because I'm new to Solr's distributed search and wanted to learn how it works (the stages are confusing). Using this method, I didn't need to setup multiple tomcat servers and manually execute tests. * We need more of the bloom filter options passable via solrconfig * I'll add more test cases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch Fixed the unit tests that were failing due to the switch over to using CoreContainer's initZooKeeper method. ZkNodeCoresManager is instantiated in CoreContainer. There's a beginning of a UI in zkcores.jsp I think we still need a core move test. I'm thinking of adding backing up a core as an action that may be performed in a new cores version file. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839937#action_12839937 ] Jason Rutherglen commented on SOLR-1724: I'm starting work on the cores file upload. The cores file is in JSON format, and can be assembled by an entirely different process (i.e. the core assignment creation is decoupled from core deployment). I need to figure out how Solr HTML HTTP file uploading works... There's probably an example somewhere. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839520#action_12839520 ] Jason Rutherglen commented on SOLR-1724: Started on the nodes reporting their status to separate files that are ephemeral nodes, there's no sense in keeping them around if the node isn't up, and the status is legitimately ephemeral. In this case, the status will be something like Core download 45% (7 GB of 15GB). Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838926#action_12838926 ] Jason Rutherglen commented on SOLR-1724: In thinking about this some more, in order for the functionality provided in this issue to be more useful, there could be a web based UI to easily view the master cores table. There can additionally be an easy way to upload the new cores version into Zookeeper. I'm not sure if the uploading should be web based or command line, I'm figuring web based, simply because this is more in line with the rest of Solr. As a core is installed or is in the midst of some other process (such as backing itself up), the node/NodeCoresManager can report the ongoing status to Zookeeper. For large cores (i.e. 20 GB) it's important to see how they're doing, and if they're taking too long, begin some remedial action. The UI can display the statuses. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch Zipping from a Lucene directory works and has a test case A ReplicationHandler is added by default under a unique name, if one exists already, we still create our own, for the express purpose of locking an index commit point, zipping it, then uploading it to, for example, HDFS. This part will likely be written next. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch Backing a core up works, at least according to the test case... I will probably begin to test this patch in a staging environment next, where Zookeeper is run in it's own process and a real HDFS cluster is used. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837898#action_12837898 ] Jason Rutherglen commented on SOLR-1724: I'm not sure how we'll handle (or if we even need to) installing a new core over an existing core of the same name, in other words core replacement. I think the instanceDir would need to be different, which means we'll need to detect and fail on the case of a new cores version (aka desired state) trying to install itself into an existing core's instanceDir. Otherwise this potential error case is costly in production. It makes me wonder about the shard id in Solr Cloud and how that can be used to uniquely identify an installed core, if a core of a given name is not guaranteed to be the same across Solr servers. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837418#action_12837418 ] Jason Rutherglen commented on SOLR-1724: We need a test case with a partial install, and cleaning up any extraneous files afterwards Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch I added a test case that simulates attempting to install a bad core. Still need to get the backup a Solr core to HDFS working. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836896#action_12836896 ] Jason Rutherglen commented on SOLR-1724: I'm taking the approach of simply reusing SnapPuller and a replication handler for each core... This'll be faster to implement and more reliable for the first release (ie I won't run into little wacky bugs because I'll be reusing code that's well tested). Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836898#action_12836898 ] Jason Rutherglen commented on SOLR-1724: Actually, I just realized the whole exercise of moving a core is pointless, it's exactly the same as replication, so this is a non-issue... I'm going to work on backing up a core to HDFS... Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835819#action_12835819 ] Jason Rutherglen commented on SOLR-1724: We need a test case for deleted and modified cores. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch Removing cores seems to work well, on to modified cores... I checkpointing progress in case things break, I can easily roll back. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835871#action_12835871 ] Jason Rutherglen edited comment on SOLR-1724 at 2/19/10 6:36 PM: - Removing cores seems to work well, on to modified cores... I'm checkpointing progress in case things break, I can easily roll back. was (Author: jasonrutherglen): Removing cores seems to work well, on to modified cores... I checkpointing progress in case things break, I can easily roll back. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835955#action_12835955 ] Jason Rutherglen commented on SOLR-1724: Also needed is the ability to move an existing core to a different Solr server. The core will need to be copied via direct HTTP file access, from a Solr server to another Solr server. There is no need to zip the core first. This feature is useful for core indexes that have been incrementally built, then need to be archived (i.e. the index was not constructed using Hadoop). Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835965#action_12835965 ] Jason Rutherglen commented on SOLR-1724: For the above core moving, utilizing the existing Java replication will probably be suitable. However, in all cases we need to copy the contents of all files related to the core (meaning everything under conf and data). How does one accomplish this? Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835981#action_12835981 ] Jason Rutherglen commented on SOLR-1724: {quote}Will this http access also allow a cluster with incrementally updated cores to replicate a core after a node failure? {quote} You're talking about moving an existing core into HDFS? That's a great idea... I'll add it to the list! Maybe for general actions to the system, there can be a ZK directory acting as a queue that contains actions to be performed by the cluster. When the action is completed it's corresponding action file is deleted. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836013#action_12836013 ] Jason Rutherglen commented on SOLR-1724: I think the check on whether a conf file's been modified, to reload the core, can borrow from the replication handler and check the diff based on the checksum of the files... Though this somewhat complicates the storage of the checksum and the resultant JSON file. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836018#action_12836018 ] Jason Rutherglen commented on SOLR-1724: Some further notes... I can reuse the replication code, but am going to place the functionality into core admin handler because it needs to work across cores and not have to be configured in each core's solrconfig. Also, we need to somehow support merging cores... Is that available yet? Looks like merge indexes is only for directories? Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836022#action_12836022 ] Jason Rutherglen commented on SOLR-1724: We need a URL type parameter to define if a URL in a core info is to a zip file or to a Solr server download point. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch * No-commit * NodeCoresManagerTest.testInstallCores works * There's HDFS test cases using MiniDFSCluster Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835490#action_12835490 ] Jason Rutherglen commented on SOLR-1724: I need to figure out how integrate this with the Solr Cloud distributed search stuff... Hmm... Maybe I'll start with the Solr Cloud test cases? Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch Updated to HEAD Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835513#action_12835513 ] Jason Rutherglen commented on SOLR-1724: I need to add the deletion policy before I can test this in a real environment, otherwise bunches of useless files will pile up in ZK. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch Added a way to hold a given number of host or cores files around in ZK, after which, the oldest are deleted. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834539#action_12834539 ] Jason Rutherglen commented on SOLR-1724: There's a wiki for this issue where the general specification is defined: http://wiki.apache.org/solr/DeploymentofSolrCoreswithZookeeper Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch No-commit NodeCoresManager[Test] needs more work A CoreController matchHosts unit test was added to CoreControllerTest Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833108#action_12833108 ] Jason Rutherglen commented on SOLR-1301: There still seems to be a bug where the temporary directory index isn't deleted on job completion. Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832587#action_12832587 ] Jason Rutherglen commented on SOLR-1395: shyjuThomas, It'd be good to update this patch to the latest Katta... You're welcome to do so... For my project I only need what'll be in SOLR-1724... Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1761) Command line Solr check softwares
[ https://issues.apache.org/jira/browse/SOLR-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1761: --- Attachment: SOLR-1761.patch No-commit Here's a couple apps that: 1) Check the query time 2) Check the last replication time They exit with error code 1 on failure, 0 on success Command line Solr check softwares - Key: SOLR-1761 URL: https://issues.apache.org/jira/browse/SOLR-1761 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: SOLR-1761.patch I'm in need of a command tool Nagios and the like can execute that verifies a Solr server is working... Basically it'll be a jar with apps that return error codes if a given criteria isn't met. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1761) Command line Solr check softwares
[ https://issues.apache.org/jira/browse/SOLR-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1761: --- Attachment: SOLR-1761.patch Here's a cleaned up, commitable version Command line Solr check softwares - Key: SOLR-1761 URL: https://issues.apache.org/jira/browse/SOLR-1761 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: SOLR-1761.patch, SOLR-1761.patch I'm in need of a command tool Nagios and the like can execute that verifies a Solr server is working... Basically it'll be a jar with apps that return error codes if a given criteria isn't met. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1761) Command line Solr check softwares
Command line Solr check softwares - Key: SOLR-1761 URL: https://issues.apache.org/jira/browse/SOLR-1761 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 I'm in need of a command tool Nagios and the like can execute that verifies a Solr server is working... Basically it'll be a jar with apps that return error codes if a given criteria isn't met. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829200#action_12829200 ] Jason Rutherglen commented on SOLR-1301: In production the latest patch does not leave temporary files behind... Though before we had failed tasks, so perhaps there's still a bug, we won't know until we run out of disk space again. Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1301: --- Attachment: SOLR-1301.patch I added the following to the SRW.close method's finally clause: {code} FileUtils.forceDelete(new File(temp.toString())); {code} Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828172#action_12828172 ] Jason Rutherglen commented on SOLR-1301: There's a bug caused by the latest change: {quote} java.io.IOException: java.lang.IllegalArgumentException: Wrong FS: hdfs://mi-prod-app01.ec2.biz360.com:9000/user/hadoop/solr/_attempt_201001212110_2841_r_01_0.1.index-a, expected: file:/// at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:371) at com.biz360.mi.index.hadoop.HadoopIndexer$ArticleReducer.reduce(HadoopIndexer.java:147) at com.biz360.mi.index.hadoop.HadoopIndexer$ArticleReducer.reduce(HadoopIndexer.java:103) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://mi-prod-app01.ec2.biz360.com:9000/user/hadoop/solr/_attempt_201001212110_2841_r_01_0.1.index-a, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.solr.hadoop.SolrRecordWriter.zipDirectory(SolrRecordWriter.java:459) at org.apache.solr.hadoop.SolrRecordWriter.packZipFile(SolrRecordWriter.java:390) at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:362) ... 5 more {quote} Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828368#action_12828368 ] Jason Rutherglen commented on SOLR-1301: I'm testing deleting the temp dir in SRW.close finally... Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1301: --- Attachment: SOLR-1301.patch This update include's Kevin's recommended path change Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: gson-1.4.jar hadoop-0.20.2-dev-test.jar hadoop-0.20.2-dev-core.jar Hadoop and Gson dependencies Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804590#action_12804590 ] Jason Rutherglen commented on SOLR-1724: {quote}If you know your going to not store file data at nodes that have children (the only way that downloading to a real file system makes sense), you could just call getChildren - if there are children, its a dir, otherwise its a file. Doesn't work for empty dirs, but you could also just do getData, and if it returns null, treat it as a dir, else treat it as a file.{quote} Thanks Mark... Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804655#action_12804655 ] Jason Rutherglen commented on SOLR-1724: Need to have a command line tool that dumps the state of the existing cluster from ZK, out to a json file for a particular version. For my setup I'll have a program that'll look at this cluster state file and generate an input file that'll be written to ZK, which essentially instructs the Solr nodes to match the new cluster state. This allows me to easily write my own functionality that operates on the cluster that's external to deploying new software into Solr. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804750#action_12804750 ] Jason Rutherglen commented on SOLR-1724: I did an svn update, though now am seeing the following error: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper within 5000 ms at org.apache.solr.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:131) at org.apache.solr.cloud.SolrZkClient.init(SolrZkClient.java:106) at org.apache.solr.cloud.SolrZkClient.init(SolrZkClient.java:72) at org.apache.solr.cloud.CoreControllerTest.testCores(CoreControllerTest.java:48) Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804760#action_12804760 ] Jason Rutherglen commented on SOLR-1724: The ZK port changed in ZkTestServer Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804773#action_12804773 ] Jason Rutherglen commented on SOLR-1724: For some reason ZkTestServer doesn't need to be shutdown any longer? Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803943#action_12803943 ] Jason Rutherglen commented on SOLR-1724: Do we have some code that recursively downloads a tree of files from ZK? The challenge is I don't see a way to find out if a given path represents a directory or not. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: commons-lang-2.4.jar, SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1724: --- Attachment: SOLR-1724.patch Here's the first cut... I agree, I'm not really into ephemeral ZK nodes for Solr hosts/nodes. The reason is contact with ZK is highly superficial and can be intermittent. I'm mostly concerned with insuring the core operations succeed on a given server. If a server goes down, there needs to be more than ZK to prove it, and if it goes down completely, I'll simply reallocate it's cores to another server using the core management mechanism provided in this patch. The issue is still being worked on, specifically the Solr server portion that downloads the cores from some location, or performs operations. The file format will move to json. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Attachments: SOLR-1724.patch Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801215#action_12801215 ] Jason Rutherglen commented on SOLR-1724: Note to self: I need a way to upload an empty core/confdir from the command line, basically into ZK, then reference that core from ZK (I think this'll work?). I'd rather not rely on a separate http server or something... The size of a jared up Solr conf dir shouldn't be too much for ZK? Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801216#action_12801216 ] Jason Rutherglen commented on SOLR-1724: Ted, Thanks for the Katta link. This patch will likely de-emphasize the distributed search part, which is where the ephemeral node is used (i.e. a given server lists it's current state). I basically want to take care of this one little deployment aspect of cores, improving on the wacky hackedy system I'm running today. Then IF it works, then I'll look at the distributed search part, hopefully in a totally separate patch. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801244#action_12801244 ] Jason Rutherglen commented on SOLR-1724: This'll be a patch on the cloud branch to reuse what's started, I don't see any core management code in there yet, so this looks complimentary. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800756#action_12800756 ] Jason Rutherglen commented on SOLR-1301: Andrzej's model works great in production. We have both 1) master - slave for incremental updates, and 2) index in Hadoop with this patch, we then deploy each new core/shard in a balanced fashion to many servers. They're two separate modalities. The ZK stuff (as it's modeled today) isn't useful here, because I want the schema I indexed with as a part of the zip file stored in HDFS (or S3, or wherever). Any sort of ZK thingy is good for managing the core/shards across many servers, however Katta does this already (so we're either reinventing the same thing, not necessarily a bad thing if we also have a clear path for incremental indexing, as discussed above). Ultimately, the Solr server can be viewed as simply a container for cores, and the cloud + ZK branch as a manager of cores/shards. Anything more ambitious will probably be overkill, and this is what I believe Ted has been trying to get at. Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800775#action_12800775 ] Jason Rutherglen commented on SOLR-1301: {quote}What I meant was the Hadoop job could simply know what the set of master indexers are and send the documents directly to them{quote} One can use Hadoop for this purpose, we have implemented the system in this way for the incremental indexes, however it doesn't require a separate patch or contrib module. The problem with the Hadoop streaming model is it doesn't scale well, if for example, we need to reindex using the CJKAnalyzer, or using Basis' analyzer etc. We use SOLR-1301 for reindexing loads of data, as fast as possible by parallelizing the indexing. There are lots of little things I'd like to add to the functionality, though, implementing ZK based core management takes a higher priority, as I spend a lot of time doing this manually today. Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800802#action_12800802 ] Jason Rutherglen commented on SOLR-1301: bq. Hadoop streaming the output of the reduce tasks to the Solr indexing servers. Yes, this is what we've implemented, it's just normal Solr HTTP based indexing, right? It works well to a limited degree, and for the particular implementation details, there are reasons why this can be less than ideal. The balanced, distributed shards/cores system works far better and enables us to use less hardware (but I'm not going into all the details here). One issue I can mention, is the switch over to a new set of incremental servers (which happens then the old servers fill up), I'm looking to automate this, and will likely focus on it and the core management in the cloud branch. Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1724) Real Basic Core Management with Zookeeper
Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800994#action_12800994 ] Jason Rutherglen commented on SOLR-1724: Additionally, upon successful completion of a core-version deployment to a set of nodes, then a customizable deletion policy like thing will be default, cleanup the old cores on the system. Real Basic Core Management with Zookeeper - Key: SOLR-1724 URL: https://issues.apache.org/jira/browse/SOLR-1724 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 1.4 Reporter: Jason Rutherglen Fix For: 1.5 Though we're implementing cloud, I need something real soon I can play with and deploy. So this'll be a patch that only deploys new cores, and that's about it. The arch is real simple: On Zookeeper there'll be a directory that contains files that represent the state of the cores of a given set of servers which will look like the following: /production/cores-1.txt /production/cores-2.txt /production/core-host-1-actual.txt (ephemeral node per host) Where each core-N.txt file contains: hostname,corename,instanceDir,coredownloadpath coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc and core-host-actual.txt contains: hostname,corename,instanceDir,size Everytime a new core-N.txt file is added, the listening host finds it's entry in the list and begins the process of trying to match the entries. Upon completion, it updates it's /core-host-1-actual.txt file to it's completed state or logs an error. When all host actual files are written (without errors), then a new core-1-actual.txt file is written which can be picked up by another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1720) replication configuration bug with multiple replicateAfter values
[ https://issues.apache.org/jira/browse/SOLR-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799843#action_12799843 ] Jason Rutherglen commented on SOLR-1720: For consistency maybe we should support comma delimited lists? I edit the shards a lot (comma delimited), which could use different elements as well, so by rote, I just used commas for this, because it seemed like a Solr standard... Thanks for clarifying! replication configuration bug with multiple replicateAfter values - Key: SOLR-1720 URL: https://issues.apache.org/jira/browse/SOLR-1720 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Yonik Seeley Fix For: 1.5 Jason reported problems with Multiple replicateAfter values - it worked after changing to just commit http://www.lucidimagination.com/search/document/e4c9ba46dc03b031/replication_problem -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797898#action_12797898 ] Jason Rutherglen commented on SOLR-1709: Tim, Thanks for the patch... bq. as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). TortoiseSVN works well on Windows, even for creating patches. Have you tried it? Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793358#action_12793358 ] Jason Rutherglen commented on SOLR-1277: {quote}Zookeeper gives us the layout of the cluster. It doesn't seem like we need (yet) fast failure detection from zookeeper - other nodes can do this synchronously themselves (and would need to anyway) on things like connection failures. App-level timeouts should not mark the node as failed since we don't know how long the request was supposed to take.{quote} Google Chubby when used in conjunction with search sets a high timeout of 60 seconds I believe? Fast failover is difficult so it'll be best to enable fast re-requesting to adjacent slave servers on request failure. Mahadev has some good advise about how we can separate the logic into different znodes. Going further I think we'll want to allow cores to register themselves, then listen to a separate directory as to what state each should be in. We'll need to insure the architecture allows for defining multiple tiers (like a pyramid). At http://wiki.apache.org/solr/ZooKeeperIntegration is a node a core or a server/corecontainer? To move ahead we'll really need to define and settle on the directory and file structure. I believe the requirement of grouping cores so that one may issue a search against a group name, instead of individual shard names will be useful. The ability to move cores to different nodes will be necessary, as is the ability to replicate cores (i.e. have multiple copies available on different servers). Today I deploy lots of cores today from HDFS across quite a few servers containing 1.6 billion documents representing at least 2.4 TB of data. I mention this because a lot can potentially go wrong in this type of setup (i.e. server's going down, corrupted data, intermittent network, etc) I generate a file that contains all the information as to which core should go to which Solr server using size based balancing. Ideally I'd be able to generate a new file, perhaps for load balancing the cores across new Solr servers or to define that hot cores should be replicated, and the Solr cluster would move the cores to the defined servers automatically. This doesn't include the separate set of servers system that handles incremental updates (i.e. master - slave). There's a bit of trepidation in moving forward on this because we don't want to engineer ourselves into a hole, however if we need to change the structure of the znodes in the future, we'll need a healthy a versioning plan such that one may upgrade a cluster while maintaining backwards compatibility on a live system. Lets think of a basic plan for this. In conclusion, lets iterate on the directory structure via the wiki or this issue? {quote}A search node can have very large caches tied to readers that all drop at once on commit, and can require a much larger heap to accommodate these caches. I think thats a more common scenario that creates these longer pauses.{quote} The large cache issue should be fixable with the various NRT changes SOLR-1606. They're collectively not much different than the search and sort per segment changes made to Lucene 2.9. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1665) Add debugTimings param so that timings for components can be retrieved without having to do explains(), as in debugQuery
[ https://issues.apache.org/jira/browse/SOLR-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793474#action_12793474 ] Jason Rutherglen commented on SOLR-1665: Plus one, visibility into the components would be good. This'll work for distributed processes (i.e. time taken on each node per component)? Add debugTimings param so that timings for components can be retrieved without having to do explains(), as in debugQuery -- Key: SOLR-1665 URL: https://issues.apache.org/jira/browse/SOLR-1665 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 As the title says, it would be great if we could just get back component timings w/o having to do the full boat of explains and other stuff. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader
[ https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789600#action_12789600 ] Jason Rutherglen commented on SOLR-1506: There's a different bug here, where because CoreContainer loads the cores sequentially, and MultiCoreReaderFactory looks for all the cores, when the proxy core isn't last, not all the cores are searchable, if the proxy is first, an exception is thrown. The workaround is to place the proxy core last, however that's not possible when using the core admin HTTP API. Hmm... Not sure what the best workaround is. Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1506.patch, SOLR-1506.patch, SOLR-1506.patch I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1606) Integrate Near Realtime
[ https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787619#action_12787619 ] Jason Rutherglen commented on SOLR-1606: {quote}In any case, I assume it must not fsync the files, so you don't get a commit where you know your in a stable condition?{quote} OK, right, for the user commit currently means that after the call, the index is in a stable state, and that it can be replicated? I agree, for clarity, I'll create a refresh command and remove the NRT option from the commit command. Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1606.patch We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1606) Integrate Near Realtime
[ https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787621#action_12787621 ] Jason Rutherglen commented on SOLR-1606: {quote}For example, q=foofreshness=1000 would cause a new realtime reader to be opened of the current one was more than 1000ms old.{quote} Good idea. Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1606.patch We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1606) Integrate Near Realtime
[ https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787686#action_12787686 ] Jason Rutherglen commented on SOLR-1606: I was going to start on the auto-warming using IndexWriter's IndexReaderWarmer, however because this is heavily cache dependent I think it'll have to wait for SOLR-1308 because we need to regenerate the cache per reader. Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1606.patch We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1606) Integrate Near Realtime
[ https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787800#action_12787800 ] Jason Rutherglen commented on SOLR-1606: The current NRT IndexWriter.getReader API cannot yet support IndexReaderFactory, I'll open a Lucene issue. Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1606.patch We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-433) MultiCore and SpellChecker replication
[ https://issues.apache.org/jira/browse/SOLR-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787155#action_12787155 ] Jason Rutherglen commented on SOLR-433: --- Are the existing patches for multiple cores or only for spellchecking? MultiCore and SpellChecker replication -- Key: SOLR-433 URL: https://issues.apache.org/jira/browse/SOLR-433 Project: Solr Issue Type: Improvement Components: replication (scripts), spellchecker Affects Versions: 1.3 Reporter: Otis Gospodnetic Fix For: 1.5 Attachments: RunExecutableListener.patch, SOLR-433-r698590.patch, SOLR-433.patch, SOLR-433.patch, SOLR-433.patch, SOLR-433.patch, solr-433.patch, SOLR-433_unified.patch, spellindexfix.patch With MultiCore functionality coming along, it looks like we'll need to be able to: A) snapshot each core's index directory, and B) replicate any and all cores' complete data directories, not just their index directories. Pulled from the spellchecker and multi-core index replication thread - http://markmail.org/message/pj2rjzegifd6zm7m Otis: I think that makes sense - distribute everything for a given core, not just its index. And the spellchecker could then also have its data dir (and only index/ underneath really) and be replicated in the same fashion. Right? Ryan: Yes, that was my thought. If an arbitrary directory could be distributed, then you could have /path/to/dist/index/... /path/to/dist/spelling-index/... /path/to/dist/foo and that would all get put into a snapshot. This would also let you put multiple cores within a single distribution: /path/to/dist/core0/index/... /path/to/dist/core0/spelling-index/... /path/to/dist/core0/foo /path/to/dist/core1/index/... /path/to/dist/core1/spelling-index/... /path/to/dist/core1/foo -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1606) Integrate Near Realtime
[ https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787206#action_12787206 ] Jason Rutherglen commented on SOLR-1606: Koji, Looks like a change to trunk is causing the error, also when I step through it passes, when I run without stepping it fails... Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1606.patch We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1606) Integrate Near Realtime
[ https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787221#action_12787221 ] Jason Rutherglen commented on SOLR-1606: bq. Don't we need a new command, like update_realtime We could however it'd work the same as commit? Meaning afterwards, all pending changes (including deletes) are available? The commit command is fairly overloaded as is. Are you thinking in terms of replication? Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1606.patch We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1619) Cache documents by their internal ID
[ https://issues.apache.org/jira/browse/SOLR-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786233#action_12786233 ] Jason Rutherglen commented on SOLR-1619: Right, we'd somehow give the user either option. Cache documents by their internal ID Key: SOLR-1619 URL: https://issues.apache.org/jira/browse/SOLR-1619 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Currently documents are cached by their Lucene docid, however we can instead cache them using their schema derived unique id. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1308) Cache docsets at the SegmentReader level
[ https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786240#action_12786240 ] Jason Rutherglen commented on SOLR-1308: {quote} Yeah... that's a pain. We could easily do per-segment faceting for non-string types though (int, long, etc) since they don't need to be merged. {quote} I opened SOLR-1617 for this. I think doc sets can be handled with a multi doc set (hopefully). Facets however, argh, FacetComponent is really hairy, though I think it boils down to simply adding field values of the same up? Then there seems to be edge cases which I'm scared of. At least it's easy to test whether we're fulfilling todays functionality by randomly unit testing per-segment and multi-segment side by side (i.e. if the results of one are different than the results of the other, we know there's something to fix). Perhaps we can initially add up field values, and test that (which is enough for my project), and move from there. I'd still like to genericize all of the distributed processes to work over multiple segments (like Lucene distributed search uses a MultiSearcher which also works locally), so that local or distributed is the same API wise. However given I've had trouble figuring out the existing distributed code (SOLR-1477 ran into a wall). Maybe as part of SolrCloud http://wiki.apache.org/solr/SolrCloud, we can rework the distributed APIs to be more user friendly (i.e. *MultiSearcher is really easy to understand). If Solr's going to work well in the cloud, distributed search probably needs to be easy to multi tier for scaling (i.e. if we have 1 proxy server and 100 nodes, we could have 1 top proxy, and 1 proxy per 10 nodes, etc). Cache docsets at the SegmentReader level Key: SOLR-1308 URL: https://issues.apache.org/jira/browse/SOLR-1308 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Original Estimate: 504h Remaining Estimate: 504h Solr caches docsets at the top level Multi*Reader level. After a commit, the filter/docset caches are flushed. Reloading the cache in near realtime (i.e. commits every 1s - 2min) unnecessarily consumes IO resources when reloading the filters, especially for largish indexes. We'll cache docsets at the SegmentReader level. The cache key will include the reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1308) Cache docsets and docs at the SegmentReader level
[ https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785433#action_12785433 ] Jason Rutherglen commented on SOLR-1308: I realized because of UnInvertedField, we'll need to merge facet results from UIF per reader, so using a MultiDocSet won't help. Can we leverage the distributed merging FacetComponent implements (i.e. reuse and/or change the code to work in both the distributed and local cases)? Ah well, I was hoping for an easy solution for realtime facets. Cache docsets and docs at the SegmentReader level - Key: SOLR-1308 URL: https://issues.apache.org/jira/browse/SOLR-1308 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Original Estimate: 504h Remaining Estimate: 504h Solr caches docsets and documents at the top level Multi*Reader level. After a commit, the caches are flushed. Reloading the caches in near realtime (i.e. commits every 1s - 2min) unnecessarily consumes IO resources, especially for largish indexes. We can cache docsets and documents at the SegmentReader level. The cache settings in SolrConfig can be applied to the individual SR caches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1308) Cache docsets at the SegmentReader level
[ https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1308: --- Description: Solr caches docsets at the top level Multi*Reader level. After a commit, the filter/docset caches are flushed. Reloading the cache in near realtime (i.e. commits every 1s - 2min) unnecessarily consumes IO resources when reloading the filters, especially for largish indexes. We'll cache docsets at the SegmentReader level. The cache key will include the reader. was: Solr caches docsets and documents at the top level Multi*Reader level. After a commit, the caches are flushed. Reloading the caches in near realtime (i.e. commits every 1s - 2min) unnecessarily consumes IO resources, especially for largish indexes. We can cache docsets and documents at the SegmentReader level. The cache settings in SolrConfig can be applied to the individual SR caches. Summary: Cache docsets at the SegmentReader level (was: Cache docsets and docs at the SegmentReader level) I changed the title because we're not going to cache docs in this issue (though I think it's possible to cache docs by the internal id, rather than the doc id). Per-segment facet caching and merging per segment can go into a different issue. Cache docsets at the SegmentReader level Key: SOLR-1308 URL: https://issues.apache.org/jira/browse/SOLR-1308 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Original Estimate: 504h Remaining Estimate: 504h Solr caches docsets at the top level Multi*Reader level. After a commit, the filter/docset caches are flushed. Reloading the cache in near realtime (i.e. commits every 1s - 2min) unnecessarily consumes IO resources when reloading the filters, especially for largish indexes. We'll cache docsets at the SegmentReader level. The cache key will include the reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1617) Cache and merge facets per segment
Cache and merge facets per segment -- Key: SOLR-1617 URL: https://issues.apache.org/jira/browse/SOLR-1617 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Spinoff from SOLR-1308. We'll enable per-segment facet caching and merging which will allow near realtime faceted searching. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1618) Merge docsets on segment merge
Merge docsets on segment merge -- Key: SOLR-1618 URL: https://issues.apache.org/jira/browse/SOLR-1618 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 When SOLR-1308 is implemented, we can save some time when creating new docsets by merging them in RAM as segments are merged (similar to LUCENE-1785) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1619) Cache documents by their internal ID
Cache documents by their internal ID Key: SOLR-1619 URL: https://issues.apache.org/jira/browse/SOLR-1619 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Currently documents are cached by their Lucene docid, however we can instead cache them using their schema derived unique id. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784973#action_12784973 ] Jason Rutherglen commented on SOLR-1277: If we're detecting node failure, it seems the functionality of Solr should also be detected for failure. The discussions thus far seem to be around network or process failure which is usually either intermittent or terminal. Detecting measurable increase/decreases in CPU, RAM consumption, OOMs, query failures, indexing failures due to bugs are probably more important than the network being down because they are harder to detect and fix. How is HBase handling the detection of functional issues in relation to ZK? Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785014#action_12785014 ] Jason Rutherglen commented on SOLR-1277: bq. The question then becomes what do you want to make automatic vs those things that require operator intervention. Right, I'd like the distributed Solr + ZK system to automatically failover to another server if there's a functional software failure. Also, with a search system query times are very important and if they suddenly drop off on a replicated server, the node needs to be removed and a new server brought online (hopefully automatically). If Solr + ZK doesn't take out a server whose query times are 10 times the average of the other comparable replicated slave servers, then it 's harder to justify going live with it, in my humble opinion because it's not really solving the main reason to use a naming service. While this may not be functionality we need in an initial release, it's important to insure our initial design does not limit future functionality. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1308) Cache docsets and docs at the SegmentReader level
[ https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784668#action_12784668 ] Jason Rutherglen commented on SOLR-1308: I'm taking a look at this, it's straightforward to cache and reuse docsets per reader in SolrIndexSearcher, however, we're passing docsets all over the place (i.e. UnInvertedField). We can't exactly rip out DocSet without breaking most unit tests, and writing a bunch of facet merging code. We'd likely lose functionality? Will the MultiDocSet concept SOLR-568 as an easy way to get something that works up and running? Then we can benchmark and see if we've lost performance? Cache docsets and docs at the SegmentReader level - Key: SOLR-1308 URL: https://issues.apache.org/jira/browse/SOLR-1308 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Original Estimate: 504h Remaining Estimate: 504h Solr caches docsets and documents at the top level Multi*Reader level. After a commit, the caches are flushed. Reloading the caches in near realtime (i.e. commits every 1s - 2min) unnecessarily consumes IO resources, especially for largish indexes. We can cache docsets and documents at the SegmentReader level. The cache settings in SolrConfig can be applied to the individual SR caches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1614) Search in Hadoop
Search in Hadoop Key: SOLR-1614 URL: https://issues.apache.org/jira/browse/SOLR-1614 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 What's the use case? Sometimes queries are expensive (such as regex) or one has indexes located in HDFS, that then need to be searched on. By leveraging Hadoop, these non-time sensitive queries may be executed without dynamically deploying the indexes to new Solr servers. We'll download the index out of HDFS (assuming they're zipped), perform the queries in a batch on the index shard, then merge the results either using a Solr query results priority queue, or simply using Hadoop's built in merge sorting. The query file will be encoded in JSON format, (ID, query, numresults,fields). The shards file will simply contain newline delimited paths (HDFS or otherwise). The output can be a Solr encoded results file per query. I'm hoping to add an actual Hadoop unit test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1609) Create a cache implementation that limits itself to a given RAM size
Create a cache implementation that limits itself to a given RAM size Key: SOLR-1609 URL: https://issues.apache.org/jira/browse/SOLR-1609 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 This is a spinoff from the unrelated SOLR-1308. We can limit the cache sizes by estimated RAM usage. I think in some cases this is a better approach when compared with using soft references as this will effectively limit the cache RAM used. Soft references will utilize the max heap before divesting itself of excessive cached items, which in some cases may not be the desired behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1610) Add generics to SolrCache
Add generics to SolrCache - Key: SOLR-1610 URL: https://issues.apache.org/jira/browse/SOLR-1610 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Seems fairly simple for SolrCache to have generics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1610) Add generics to SolrCache
[ https://issues.apache.org/jira/browse/SOLR-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1610: --- Attachment: SOLR-1610.patch Compiles, ran some of the unit tests. Not sure what else needs to be done? Add generics to SolrCache - Key: SOLR-1610 URL: https://issues.apache.org/jira/browse/SOLR-1610 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1610.patch Seems fairly simple for SolrCache to have generics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1606) Integrate Near Realtime
Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1606) Integrate Near Realtime
[ https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1606: --- Attachment: SOLR-1606.patch Solr config can have an index nrt (true|false), or commit can specify the nrt var. With nrt=true, when creating a new searcher we call getReader. Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1606.patch We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1578) Develop a Spatial Query Parser
[ https://issues.apache.org/jira/browse/SOLR-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780184#action_12780184 ] Jason Rutherglen commented on SOLR-1578: GBase http://code.google.com/apis/base/docs/2.0/query-lang-spec.html (Locations section at the bottom of the page) has a query syntax for spatial queries (i.e. @+40.75-074.00 + 5mi) Develop a Spatial Query Parser -- Key: SOLR-1578 URL: https://issues.apache.org/jira/browse/SOLR-1578 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 1.5 Given all the work around spatial, it would be beneficial if Solr had a query parser for dealing with spatial queries. For starters, something that used geonames data or maybe even Google Maps API would be really useful. Longer term, a spatial grammar that can robustly handle all the vagaries of addresses, etc. would be really cool. Refs: [1] http://www.geonames.org/export/client-libraries.html (note the Java client is ASL) [2] Data from geo names: http://download.geonames.org/export/dump/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1506) Search multiple cores using MultiReader
[ https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1506: --- Attachment: SOLR-1506.patch MultiReader doesn't support reopen with the readOnly parameter. This patch adds a test case for commit on the proxy, and a workaround (if unsupported is caught, then regular reopen is called). Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1506.patch, SOLR-1506.patch, SOLR-1506.patch I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader
[ https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773104#action_12773104 ] Jason Rutherglen commented on SOLR-1506: Commit doesn't work because reopen isn't supported by MultiReader. Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1506.patch, SOLR-1506.patch I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader
[ https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772793#action_12772793 ] Jason Rutherglen commented on SOLR-1506: There's a bug here with getting the status of multiple cores: SEVERE: org.apache.solr.common.SolrException: Error handling 'status' action at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:362) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:131) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.UnsupportedOperationException: This reader does not support this method. at org.apache.lucene.index.IndexReader.directory(IndexReader.java:592) at org.apache.solr.search.SolrIndexReader.directory(SolrIndexReader.java:222) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:442) at org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:449) at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:353 Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1506.patch, SOLR-1506.patch I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771704#action_12771704 ] Jason Rutherglen commented on SOLR-1395: Pravin, I'll review the test case when I can. Did you download and apply the latest patch? Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1477) Search on multi-tier cores
[ https://issues.apache.org/jira/browse/SOLR-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767594#action_12767594 ] Jason Rutherglen commented on SOLR-1477: The use case is scaling to hundreds of servers where a single distributed search proxy server becomes a bottleneck, or simply querying multiple local cores. Either way the same muti-tiered distributed search module will be highly effective. Search on multi-tier cores -- Key: SOLR-1477 URL: https://issues.apache.org/jira/browse/SOLR-1477 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch Search on cores in the container, using distributed search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1477) Search on multi-tier cores
[ https://issues.apache.org/jira/browse/SOLR-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767600#action_12767600 ] Jason Rutherglen commented on SOLR-1477: The way the process should work for this patch is: 1) Incoming query to shard proxy server 2) getids passed to N intermediary proxy servers 3) Intermediary proxy servers forwards the getids call to Y Solr servers 4) Y Solr servers respond, i-proxy merges the ids, and sends the response to the toplevel proxy server from step 1) 5) The toplevel proxy merges the results from the i-proxies 6) getdocs is passed from proxy 1) to the i-proxies 7) i-proxies call Solr servers to obtain documents (the actual shard the documents exist on needs to be passed to the i-proxy to avoid redundancy) 8) iproxies send the results of getdocs to the toplevel proxy 9) The request is completed. I know that's muddy but it's a start. Search on multi-tier cores -- Key: SOLR-1477 URL: https://issues.apache.org/jira/browse/SOLR-1477 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch Search on cores in the container, using distributed search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1301: --- Attachment: SOLR-1301.patch Here's an update that includes the change Jason mentioned above (needHeartBeat in SRW.close). I've run this patch in production, however I was unable to turn off logging due to complexities with SLF4J layering Hadoop where I could not turn off the Solr update logs. I had to comment out the logging lines in Solr to insure the Hadoop logs did not fill up. Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1506) Search multiple cores using MultiReader
[ https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1506: --- Attachment: SOLR-1506.patch Fixes a bug, added Apache headers Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1506.patch, SOLR-1506.patch I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1477) Search on multi-tier cores
[ https://issues.apache.org/jira/browse/SOLR-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1477: --- Priority: Minor (was: Trivial) Summary: Search on multi-tier cores (was: Search on local cores) Search on multi-tier cores -- Key: SOLR-1477 URL: https://issues.apache.org/jira/browse/SOLR-1477 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch Search on cores in the container, using distributed search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
[ https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766641#action_12766641 ] Jason Rutherglen commented on SOLR-1513: Noble, before implementing, I was wondering if there's performance testing code for ConcurrentLRUCache in case Google Col somehow slows things down? Use Google Collections in ConcurrentLRUCache Key: SOLR-1513 URL: https://issues.apache.org/jira/browse/SOLR-1513 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 ConcurrentHashMap is used in ConcurrentLRUCache. The Google Colletions concurrent map implementation allows for soft values that are great for caches that potentially exceed the allocated heap. Though I suppose Solr caches usually don't use too much RAM? http://code.google.com/p/google-collections/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
[ https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1513: --- Attachment: google-collect-snapshot.jar SOLR-1513.patch Here's a basic implementation, it needs testing for performance and what happens if a value is removed before a key (in which case the map could return null?). There are a number of configurable params so we'll add those as options for solrconfig. Use Google Collections in ConcurrentLRUCache Key: SOLR-1513 URL: https://issues.apache.org/jira/browse/SOLR-1513 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: google-collect-snapshot.jar, SOLR-1513.patch ConcurrentHashMap is used in ConcurrentLRUCache. The Google Colletions concurrent map implementation allows for soft values that are great for caches that potentially exceed the allocated heap. Though I suppose Solr caches usually don't use too much RAM? http://code.google.com/p/google-collections/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
Use Google Collections in ConcurrentLRUCache Key: SOLR-1513 URL: https://issues.apache.org/jira/browse/SOLR-1513 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 ConcurrentHashMap is used in ConcurrentLRUCache. The Google Colletions concurrent map implementation allows for soft values that are great for caches that potentially exceed the allocated heap. Though I suppose Solr caches usually don't use too much RAM? http://code.google.com/p/google-collections/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
[ https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766356#action_12766356 ] Jason Rutherglen commented on SOLR-1513: I've tuned down my caches to not deal with OOMs and swapping. I'd rather the cache simply remove values before swapping or OOMs. I think it would simply be an option, which I'd personally always have on! Use Google Collections in ConcurrentLRUCache Key: SOLR-1513 URL: https://issues.apache.org/jira/browse/SOLR-1513 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 ConcurrentHashMap is used in ConcurrentLRUCache. The Google Colletions concurrent map implementation allows for soft values that are great for caches that potentially exceed the allocated heap. Though I suppose Solr caches usually don't use too much RAM? http://code.google.com/p/google-collections/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764745#action_12764745 ] Jason Rutherglen commented on SOLR-1301: Thanks for the update Jason. It runs great, I've generated over a terabyte of indexes using the patch. Now I'm trying to deploy them, and that's harder! Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 1.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1506) Search multiple cores using MultiReader
Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1506) Search multiple cores using MultiReader
[ https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1506: --- Attachment: SOLR-1506.patch Well, it seems to work, though I had to comment out the reader.directory() call in SolrCore. I'm not sure what to do there yet, but this is good enough for now. Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1506.patch I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1502) Add form to perform updates
Add form to perform updates --- Key: SOLR-1502 URL: https://issues.apache.org/jira/browse/SOLR-1502 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 A convenience UI to perform updates via the Web UI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.