from:"Jason Rutherglen \(JIRA\)"

[jira] Commented: (SOLR-1375) BloomFilter on a field

2010-03-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851637#action_12851637
]

Jason Rutherglen commented on SOLR-1375:

{quote}Doesn't this hint at some of this stuff (haven't looked at the patch)
really needing to live in Lucene index segment files merging land?{quote}

Adding this to Lucene is out of the scope of what I require, however I don't
have time unless it's going to be committed.

BloomFilter on a field
--

Key: SOLR-1375
URL: https://issues.apache.org/jira/browse/SOLR-1375
Project: Solr
Issue Type: New Feature
Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
Fix For: 1.5

Attachments: SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch,
SOLR-1375.patch, SOLR-1375.patch

Original Estimate: 120h
Remaining Estimate: 120h

* A bloom filter is a read only probabilistic set. Its useful
for verifying a key exists in a set, though it returns false
positives. http://en.wikipedia.org/wiki/Bloom_filter
* The use case is indexing in Hadoop and checking for duplicates
against a Solr cluster (which when using term dictionary or a
query) is too slow and exceeds the time consumed for indexing.
When a match is found, the host, segment, and term are returned.
If the same term is found on multiple servers, multiple results
are returned by the distributed process. (We'll need to add in
the core name I just realized).
* When new segments are created, and commit is called, a new
bloom filter is generated from a given field (default:id) by
iterating over the term dictionary values. There's a bloom
filter file per segment, which is managed on each Solr shard.
When segments are merged away, their corresponding .blm files is
also removed. In a future version we'll have a central server
for the bloom filters so we're not abusing the thread pool of
the Solr proxy and the networking of the Solr cluster (this will
be done sooner than later after testing this version). I held
off because the central server requires syncing the Solr
servers' files (which is like reverse replication).
* The patch uses the BloomFilter from Hadoop 0.20. I want to jar
up only the necessary classes so we don't have a giant Hadoop
jar in lib.
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
* Distributed code is added and seems to work, I extended
TestDistributedSearch to test over multiple HTTP servers. I
chose this approach rather than the manual method used by (for
example) TermVectorComponent.testDistributed because I'm new to
Solr's distributed search and wanted to learn how it works (the
stages are confusing). Using this method, I didn't need to setup
multiple tomcat servers and manually execute tests.
* We need more of the bloom filter options passable via
solrconfig
* I'll add more test cases

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-03-02 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

Fixed the unit tests that were failing due to the switch over to using 
CoreContainer's initZooKeeper method.  ZkNodeCoresManager is instantiated in 
CoreContainer.  

There's a beginning of a UI in zkcores.jsp

I think we still need a core move test.  I'm thinking of adding backing up a 
core as an action that may be performed in a new cores version file.  

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-03-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839937#action_12839937
]

Jason Rutherglen commented on SOLR-1724:

I'm starting work on the cores file upload. The cores file is in JSON format,
and can be assembled by an entirely different process (i.e. the core assignment
creation is decoupled from core deployment).

I need to figure out how Solr HTML HTTP file uploading works... There's
probably an example somewhere.

Real Basic Core Management with Zookeeper
-

Key: SOLR-1724
URL: https://issues.apache.org/jira/browse/SOLR-1724
Project: Solr
Issue Type: New Feature
Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
Fix For: 1.5

Attachments: commons-lang-2.4.jar, gson-1.4.jar,
hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch,
SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch,
SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch,
SOLR-1724.patch

Though we're implementing cloud, I need something real soon I can
play with and deploy. So this'll be a patch that only deploys
new cores, and that's about it. The arch is real simple:
On Zookeeper there'll be a directory that contains files that
represent the state of the cores of a given set of servers which
will look like the following:
/production/cores-1.txt
/production/cores-2.txt
/production/core-host-1-actual.txt (ephemeral node per host)
Where each core-N.txt file contains:
hostname,corename,instanceDir,coredownloadpath
coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://,
etc
and
core-host-actual.txt contains:
hostname,corename,instanceDir,size
Everytime a new core-N.txt file is added, the listening host
finds it's entry in the list and begins the process of trying to
match the entries. Upon completion, it updates it's
/core-host-1-actual.txt file to it's completed state or logs an error.
When all host actual files are written (without errors), then a
new core-1-actual.txt file is written which can be picked up by
another process that can create a new core proxy.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-28 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839520#action_12839520
 ] 

Jason Rutherglen commented on SOLR-1724:


Started on the nodes reporting their status to separate files that are 
ephemeral nodes, there's no sense in keeping them around if the node isn't up, 
and the status is legitimately ephemeral.  In this case, the status will be 
something like Core download 45% (7 GB of 15GB).  

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-26 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838926#action_12838926
]

Jason Rutherglen commented on SOLR-1724:

In thinking about this some more, in order for the functionality
provided in this issue to be more useful, there could be a web
based UI to easily view the master cores table. There can
additionally be an easy way to upload the new cores version into
Zookeeper. I'm not sure if the uploading should be web based or
command line, I'm figuring web based, simply because this is
more in line with the rest of Solr.

As a core is installed or is in the midst of some other process
(such as backing itself up), the node/NodeCoresManager can
report the ongoing status to Zookeeper. For large cores (i.e. 20
GB) it's important to see how they're doing, and if they're
taking too long, begin some remedial action. The UI can display
the statuses.

Real Basic Core Management with Zookeeper
-

Key: SOLR-1724
URL: https://issues.apache.org/jira/browse/SOLR-1724
Project: Solr
Issue Type: New Feature
Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-25 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

Zipping from a Lucene directory works and has a test case

A ReplicationHandler is added by default under a unique name, if one exists 
already, we still create our own, for the express purpose of locking an index 
commit point, zipping it, then uploading it to, for example, HDFS.  This part 
will likely be written next.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-25 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

Backing a core up works, at least according to the test case... I will probably 
begin to test this patch in a staging environment next, where Zookeeper is run 
in it's own process and a real HDFS cluster is used.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-24 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837898#action_12837898
 ] 

Jason Rutherglen commented on SOLR-1724:


I'm not sure how we'll handle (or if we even need to) installing
a new core over an existing core of the same name, in other
words core replacement. I think the instanceDir would need to be
different, which means we'll need to detect and fail on the case
of a new cores version (aka desired state) trying to install
itself into an existing core's instanceDir. Otherwise this
potential error case is costly in production. 

It makes me wonder about the shard id in Solr Cloud and how that
can be used to uniquely identify an installed core, if a core of
a given name is not guaranteed to be the same across Solr
servers.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-23 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837418#action_12837418
 ] 

Jason Rutherglen commented on SOLR-1724:


We need a test case with a partial install, and cleaning up any extraneous 
files afterwards

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-23 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

I added a test case that simulates attempting to install a bad core.

Still need to get the backup a Solr core to HDFS working.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-22 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836896#action_12836896
 ] 

Jason Rutherglen commented on SOLR-1724:


I'm taking the approach of simply reusing SnapPuller and a replication handler 
for each core... This'll be faster to implement and more reliable for the first 
release (ie I won't run into little wacky bugs because I'll be reusing code 
that's well tested).  

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-22 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836898#action_12836898
 ] 

Jason Rutherglen commented on SOLR-1724:


Actually, I just realized the whole exercise of moving a core is pointless, 
it's exactly the same as replication, so this is a non-issue...

I'm going to work on backing up a core to HDFS...

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835819#action_12835819
 ] 

Jason Rutherglen commented on SOLR-1724:


We need a test case for deleted and modified cores.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

Removing cores seems to work well, on to modified cores... I checkpointing 
progress in case things break, I can easily roll back.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835871#action_12835871
 ] 

Jason Rutherglen edited comment on SOLR-1724 at 2/19/10 6:36 PM:
-

Removing cores seems to work well, on to modified cores... I'm checkpointing 
progress in case things break, I can easily roll back.

  was (Author: jasonrutherglen):
Removing cores seems to work well, on to modified cores... I checkpointing 
progress in case things break, I can easily roll back.
  
 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835955#action_12835955
]

Jason Rutherglen commented on SOLR-1724:

Also needed is the ability to move an existing core to a
different Solr server. The core will need to be copied via
direct HTTP file access, from a Solr server to another Solr
server. There is no need to zip the core first.

This feature is useful for core indexes that have been
incrementally built, then need to be archived (i.e. the index was not
constructed using Hadoop).

Real Basic Core Management with Zookeeper
-

Key: SOLR-1724
URL: https://issues.apache.org/jira/browse/SOLR-1724
Project: Solr
Issue Type: New Feature
Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835965#action_12835965
 ] 

Jason Rutherglen commented on SOLR-1724:


For the above core moving, utilizing the existing Java replication will 
probably be suitable.  However, in all cases we need to copy the contents of 
all files related to the core (meaning everything under conf and data).  How 
does one accomplish this?

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835981#action_12835981
 ] 

Jason Rutherglen commented on SOLR-1724:


{quote}Will this http access also allow a cluster with
incrementally updated cores to replicate a core after a node
failure? {quote}

You're talking about moving an existing core into HDFS? That's a
great idea... I'll add it to the list!

Maybe for general actions to the system, there can be a ZK
directory acting as a queue that contains actions to be
performed by the cluster. When the action is completed it's
corresponding action file is deleted. 

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836013#action_12836013
 ] 

Jason Rutherglen commented on SOLR-1724:


I think the check on whether a conf file's been modified, to reload the core, 
can borrow from the replication handler and check the diff based on the 
checksum of the files... Though this somewhat complicates the storage of the 
checksum and the resultant JSON file.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836018#action_12836018
 ] 

Jason Rutherglen commented on SOLR-1724:


Some further notes... I can reuse the replication code, but am going to place 
the functionality into core admin handler because it needs to work across cores 
and not have to be configured in each core's solrconfig.  

Also, we need to somehow support merging cores... Is that available yet?  Looks 
like merge indexes is only for directories?

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836022#action_12836022
 ] 

Jason Rutherglen commented on SOLR-1724:


We need a URL type parameter to define if a URL in a core info is to a zip file 
or to a Solr server download point.  

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-18 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

* No-commit

* NodeCoresManagerTest.testInstallCores works

* There's HDFS test cases using MiniDFSCluster



 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-18 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835490#action_12835490
 ] 

Jason Rutherglen commented on SOLR-1724:


I need to figure out how integrate this with the Solr Cloud distributed search 
stuff... Hmm... Maybe I'll start with the Solr Cloud test cases?

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-18 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

Updated to HEAD

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-18 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835513#action_12835513
 ] 

Jason Rutherglen commented on SOLR-1724:


I need to add the deletion policy before I can test this in a real environment, 
otherwise bunches of useless files will pile up in ZK.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-18 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

Added a way to hold a given number of host or cores files around in ZK, after 
which, the oldest are deleted.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
 SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834539#action_12834539
 ] 

Jason Rutherglen commented on SOLR-1724:


There's a wiki for this issue where the general specification is defined: 

http://wiki.apache.org/solr/DeploymentofSolrCoreswithZookeeper

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-16 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

No-commit

NodeCoresManager[Test] needs more work

A CoreController matchHosts unit test was added to CoreControllerTest

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
 SOLR-1724.patch, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-02-12 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833108#action_12833108
]

Jason Rutherglen commented on SOLR-1301:

There still seems to be a bug where the temporary directory index isn't deleted
on job completion.

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

Attachments: commons-logging-1.0.4.jar,
commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch,
log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SolrRecordWriter.java

This patch contains a contrib module that provides distributed indexing
(using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is
twofold:
* provide an API that is familiar to Hadoop developers, i.e. that of
OutputFormat
* avoid unnecessary export and (de)serialization of data maintained on HDFS.
SolrOutputFormat consumes data produced by reduce tasks directly, without
storing it in intermediate files. Furthermore, by using an
EmbeddedSolrServer, the indexing task is split into as many parts as there
are reducers, and the data to be indexed is not sent over the network.
Design
--
Key/value pairs produced by reduce tasks are passed to SolrOutputFormat,
which in turn uses SolrRecordWriter to write this data. SolrRecordWriter
instantiates an EmbeddedSolrServer, and it also instantiates an
implementation of SolrDocumentConverter, which is responsible for turning
Hadoop (key, value) into a SolrInputDocument. This data is then added to a
batch, which is periodically submitted to EmbeddedSolrServer. When reduce
task completes, and the OutputFormat is closed, SolrRecordWriter calls
commit() and optimize() on the EmbeddedSolrServer.
The API provides facilities to specify an arbitrary existing solr.home
directory, from which the conf/ and lib/ files will be taken.
This process results in the creation of as many partial Solr home directories
as there were reduce tasks. The output shards are placed in the output
directory on the default filesystem (e.g. HDFS). Such part-N directories
can be used to run N shard servers. Additionally, users can specify the
number of reduce tasks, in particular 1 reduce task, in which case the output
will consist of a single shard.
An example application is provided that processes large CSV files and uses
this API. It uses a custom CSV processing to avoid (de)serialization overhead.
This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this
issue, you should put it in contrib/hadoop/lib.
Note: the development of this patch was sponsored by an anonymous contributor
and approved for release under Apache License.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1395) Integrate Katta

2010-02-11 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832587#action_12832587
 ] 

Jason Rutherglen commented on SOLR-1395:


shyjuThomas,

It'd be good to update this patch to the latest Katta... You're welcome to do 
so... For my project I only need what'll be in SOLR-1724... 

 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
 katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, 
 solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431.patch, 
 SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
 test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1761) Command line Solr check softwares

2010-02-08 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1761:
---

Attachment: SOLR-1761.patch

No-commit

Here's a couple apps that:

1) Check the query time
2) Check the last replication time

They exit with error code 1 on failure, 0 on success

 Command line Solr check softwares
 -

 Key: SOLR-1761
 URL: https://issues.apache.org/jira/browse/SOLR-1761
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: SOLR-1761.patch


 I'm in need of a command tool Nagios and the like can execute that verifies a 
 Solr server is working... Basically it'll be a jar with apps that return 
 error codes if a given criteria isn't met.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1761) Command line Solr check softwares

2010-02-08 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1761:
---

Attachment: SOLR-1761.patch

Here's a cleaned up, commitable version

 Command line Solr check softwares
 -

 Key: SOLR-1761
 URL: https://issues.apache.org/jira/browse/SOLR-1761
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: SOLR-1761.patch, SOLR-1761.patch


 I'm in need of a command tool Nagios and the like can execute that verifies a 
 Solr server is working... Basically it'll be a jar with apps that return 
 error codes if a given criteria isn't met.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1761) Command line Solr check softwares

2010-02-06 Thread Jason Rutherglen (JIRA)

Command line Solr check softwares
-

 Key: SOLR-1761
 URL: https://issues.apache.org/jira/browse/SOLR-1761
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5


I'm in need of a command tool Nagios and the like can execute that verifies a 
Solr server is working... Basically it'll be a jar with apps that return error 
codes if a given criteria isn't met.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-02-03 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829200#action_12829200
]

Jason Rutherglen commented on SOLR-1301:

In production the latest patch does not leave temporary files behind... Though
before we had failed tasks, so perhaps there's still a bug, we won't know until
we run out of disk space again.

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1301) Solr + Hadoop

2010-02-02 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated SOLR-1301:
---

Attachment: SOLR-1301.patch

I added the following to the SRW.close method's finally clause:

{code}
FileUtils.forceDelete(new File(temp.toString()));
{code}

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-02-01 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828172#action_12828172
 ] 

Jason Rutherglen commented on SOLR-1301:


There's a bug caused by the latest change:
{quote}
java.io.IOException: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://mi-prod-app01.ec2.biz360.com:9000/user/hadoop/solr/_attempt_201001212110_2841_r_01_0.1.index-a,
 expected: file:///
at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:371)
at 
com.biz360.mi.index.hadoop.HadoopIndexer$ArticleReducer.reduce(HadoopIndexer.java:147)
at 
com.biz360.mi.index.hadoop.HadoopIndexer$ArticleReducer.reduce(HadoopIndexer.java:103)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://mi-prod-app01.ec2.biz360.com:9000/user/hadoop/solr/_attempt_201001212110_2841_r_01_0.1.index-a,
 expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305)
at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at 
org.apache.solr.hadoop.SolrRecordWriter.zipDirectory(SolrRecordWriter.java:459)
at 
org.apache.solr.hadoop.SolrRecordWriter.packZipFile(SolrRecordWriter.java:390)
at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:362)
... 5 more 
{quote}

 Solr + Hadoop
 -

 Key: SOLR-1301
 URL: https://issues.apache.org/jira/browse/SOLR-1301
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki 
 Fix For: 1.5

 Attachments: commons-logging-1.0.4.jar, 
 commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, 
 log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SolrRecordWriter.java


 This patch contains  a contrib module that provides distributed indexing 
 (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
 twofold:
 * provide an API that is familiar to Hadoop developers, i.e. that of 
 OutputFormat
 * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
 SolrOutputFormat consumes data produced by reduce tasks directly, without 
 storing it in intermediate files. Furthermore, by using an 
 EmbeddedSolrServer, the indexing task is split into as many parts as there 
 are reducers, and the data to be indexed is not sent over the network.
 Design
 --
 Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
 which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
 instantiates an EmbeddedSolrServer, and it also instantiates an 
 implementation of SolrDocumentConverter, which is responsible for turning 
 Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
 batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
 task completes, and the OutputFormat is closed, SolrRecordWriter calls 
 commit() and optimize() on the EmbeddedSolrServer.
 The API provides facilities to specify an arbitrary existing solr.home 
 directory, from which the conf/ and lib/ files will be taken.
 This process results in the creation of as many partial Solr home directories 
 as there were reduce tasks. The output shards are placed in the output 
 directory on the default filesystem (e.g. HDFS). Such part-N directories 
 can be used to run N shard servers. Additionally, users can specify the 
 number of reduce tasks, in particular 1 reduce task, in which case the output 
 will consist of a single shard.
 An example application is provided that processes large CSV files and uses 
 this API. It uses a custom CSV processing to avoid (de)serialization overhead.
 This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
 issue, you should put it in contrib/hadoop/lib.
 Note: the development of this patch was sponsored by an anonymous contributor 
 and approved for release under Apache License.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-02-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828368#action_12828368
]

Jason Rutherglen commented on SOLR-1301:

I'm testing deleting the temp dir in SRW.close finally...

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1301) Solr + Hadoop

2010-01-31 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated SOLR-1301:
---

Attachment: SOLR-1301.patch

This update include's Kevin's recommended path change

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-28 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: gson-1.4.jar
hadoop-0.20.2-dev-test.jar
hadoop-0.20.2-dev-core.jar

Hadoop and Gson dependencies

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
 hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-25 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804590#action_12804590
 ] 

Jason Rutherglen commented on SOLR-1724:


{quote}If you know your going to not store file data at nodes
that have children (the only way that downloading to a real file
system makes sense), you could just call getChildren - if there
are children, its a dir, otherwise its a file. Doesn't work for
empty dirs, but you could also just do getData, and if it
returns null, treat it as a dir, else treat it as a file.{quote}

Thanks Mark... 

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-25 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804655#action_12804655
 ] 

Jason Rutherglen commented on SOLR-1724:


Need to have a command line tool that dumps the state of the
existing cluster from ZK, out to a json file for a particular
version. 

For my setup I'll have a program that'll look at this cluster
state file and generate an input file that'll be written to ZK,
which essentially instructs the Solr nodes to match the new
cluster state. This allows me to easily write my own
functionality that operates on the cluster that's external to
deploying new software into Solr. 

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-25 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804750#action_12804750
 ] 

Jason Rutherglen commented on SOLR-1724:


I did an svn update, though now am seeing the following error:

java.util.concurrent.TimeoutException: Could not connect to ZooKeeper within 
5000 ms
at 
org.apache.solr.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:131)
at org.apache.solr.cloud.SolrZkClient.init(SolrZkClient.java:106)
at org.apache.solr.cloud.SolrZkClient.init(SolrZkClient.java:72)
at 
org.apache.solr.cloud.CoreControllerTest.testCores(CoreControllerTest.java:48)

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-25 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804760#action_12804760
 ] 

Jason Rutherglen commented on SOLR-1724:


The ZK port changed in ZkTestServer

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-25 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804773#action_12804773
 ] 

Jason Rutherglen commented on SOLR-1724:


For some reason ZkTestServer doesn't need to be shutdown any longer?

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-22 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803943#action_12803943
 ] 

Jason Rutherglen commented on SOLR-1724:


Do we have some code that recursively downloads a tree of files from ZK?  The 
challenge is I don't see a way to find out if a given path represents a 
directory or not.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5

 Attachments: commons-lang-2.4.jar, SOLR-1724.patch


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-21 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

Here's the first cut... I agree, I'm not really into ephemeral
ZK nodes for Solr hosts/nodes. The reason is contact with ZK is
highly superficial and can be intermittent. I'm mostly concerned
with insuring the core operations succeed on a given server. If
a server goes down, there needs to be more than ZK to prove it,
and if it goes down completely, I'll simply reallocate it's
cores to another server using the core management mechanism
provided in this patch.

The issue is still being worked on, specifically the Solr server
portion that downloads the cores from some location, or performs
operations. The file format will move to json.

Real Basic Core Management with Zookeeper
-

Key: SOLR-1724
URL: https://issues.apache.org/jira/browse/SOLR-1724
Project: Solr
Issue Type: New Feature
Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
Fix For: 1.5

Attachments: SOLR-1724.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801215#action_12801215
 ] 

Jason Rutherglen commented on SOLR-1724:


Note to self: I need a way to upload an empty core/confdir from the command 
line, basically into ZK, then reference that core from ZK (I think this'll 
work?).  I'd rather not rely on a separate http server or something... The size 
of a jared up Solr conf dir shouldn't be too much for ZK?

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-16 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801216#action_12801216
]

Jason Rutherglen commented on SOLR-1724:

Ted,

Thanks for the Katta link.

This patch will likely de-emphasize the distributed search part,
which is where the ephemeral node is used (i.e. a given server
lists it's current state). I basically want to take care of this
one little deployment aspect of cores, improving on the wacky
hackedy system I'm running today. Then IF it works, then I'll
look at the distributed search part, hopefully in a totally
separate patch.

Real Basic Core Management with Zookeeper
-

Key: SOLR-1724
URL: https://issues.apache.org/jira/browse/SOLR-1724
Project: Solr
Issue Type: New Feature
Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801244#action_12801244
 ] 

Jason Rutherglen commented on SOLR-1724:


This'll be a patch on the cloud branch to reuse what's started, I don't see any 
core management code in there yet, so this looks complimentary.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-01-15 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800756#action_12800756
]

Jason Rutherglen commented on SOLR-1301:

Andrzej's model works great in production. We have both 1)
master - slave for incremental updates, and 2) index in Hadoop
with this patch, we then deploy each new core/shard in a
balanced fashion to many servers. They're two separate
modalities. The ZK stuff (as it's modeled today) isn't useful
here, because I want the schema I indexed with as a part of the
zip file stored in HDFS (or S3, or wherever).

Any sort of ZK thingy is good for managing the core/shards
across many servers, however Katta does this already (so we're
either reinventing the same thing, not necessarily a bad thing
if we also have a clear path for incremental indexing, as
discussed above). Ultimately, the Solr server can be viewed as
simply a container for cores, and the cloud + ZK branch as a
manager of cores/shards. Anything more ambitious will probably
be overkill, and this is what I believe Ted has been trying to get at.

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-01-15 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800775#action_12800775
]

Jason Rutherglen commented on SOLR-1301:

{quote}What I meant was the Hadoop job could simply know what
the set of master indexers are and send the documents directly
to them{quote}

One can use Hadoop for this purpose, we have implemented the
system in this way for the incremental indexes, however it
doesn't require a separate patch or contrib module. The problem
with the Hadoop streaming model is it doesn't scale well, if for
example, we need to reindex using the CJKAnalyzer, or using
Basis' analyzer etc. We use SOLR-1301 for reindexing loads of
data, as fast as possible by parallelizing the indexing. There
are lots of little things I'd like to add to the functionality,
though, implementing ZK based core management takes a higher
priority, as I spend a lot of time doing this manually today.

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-01-15 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800802#action_12800802
]

Jason Rutherglen commented on SOLR-1301:

bq. Hadoop streaming the output of the reduce tasks to the Solr
indexing servers.

Yes, this is what we've implemented, it's just normal Solr HTTP
based indexing, right? It works well to a limited degree, and
for the particular implementation details, there are reasons why
this can be less than ideal. The balanced, distributed
shards/cores system works far better and enables us to use less
hardware (but I'm not going into all the details here).

One issue I can mention, is the switch over to a new set of
incremental servers (which happens then the old servers fill
up), I'm looking to automate this, and will likely focus on it
and the core management in the cloud branch.

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-15 Thread Jason Rutherglen (JIRA)

Real Basic Core Management with Zookeeper
-

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5


Though we're implementing cloud, I need something real soon I can
play with and deploy. So this'll be a patch that only deploys
new cores, and that's about it. The arch is real simple:

On Zookeeper there'll be a directory that contains files that
represent the state of the cores of a given set of servers which
will look like the following:

/production/cores-1.txt
/production/cores-2.txt
/production/core-host-1-actual.txt (ephemeral node per host)

Where each core-N.txt file contains:

hostname,corename,instanceDir,coredownloadpath

coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
etc

and

core-host-actual.txt contains:

hostname,corename,instanceDir,size

Everytime a new core-N.txt file is added, the listening host
finds it's entry in the list and begins the process of trying to
match the entries. Upon completion, it updates it's
/core-host-1-actual.txt file to it's completed state or logs an error.

When all host actual files are written (without errors), then a
new core-1-actual.txt file is written which can be picked up by
another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-15 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800994#action_12800994
 ] 

Jason Rutherglen commented on SOLR-1724:


Additionally, upon successful completion of a core-version deployment to a set 
of nodes, then a customizable deletion policy like thing will be default, 
cleanup the old cores on the system.

 Real Basic Core Management with Zookeeper
 -

 Key: SOLR-1724
 URL: https://issues.apache.org/jira/browse/SOLR-1724
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 1.4
Reporter: Jason Rutherglen
 Fix For: 1.5


 Though we're implementing cloud, I need something real soon I can
 play with and deploy. So this'll be a patch that only deploys
 new cores, and that's about it. The arch is real simple:
 On Zookeeper there'll be a directory that contains files that
 represent the state of the cores of a given set of servers which
 will look like the following:
 /production/cores-1.txt
 /production/cores-2.txt
 /production/core-host-1-actual.txt (ephemeral node per host)
 Where each core-N.txt file contains:
 hostname,corename,instanceDir,coredownloadpath
 coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
 etc
 and
 core-host-actual.txt contains:
 hostname,corename,instanceDir,size
 Everytime a new core-N.txt file is added, the listening host
 finds it's entry in the list and begins the process of trying to
 match the entries. Upon completion, it updates it's
 /core-host-1-actual.txt file to it's completed state or logs an error.
 When all host actual files are written (without errors), then a
 new core-1-actual.txt file is written which can be picked up by
 another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1720) replication configuration bug with multiple replicateAfter values

2010-01-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799843#action_12799843
 ] 

Jason Rutherglen commented on SOLR-1720:


For consistency maybe we should support comma delimited lists?  I edit the 
shards a lot (comma delimited), which could use different elements as well, so 
by rote, I just used commas for this, because it seemed like a Solr standard... 

Thanks for clarifying!

 replication configuration bug with multiple replicateAfter values
 -

 Key: SOLR-1720
 URL: https://issues.apache.org/jira/browse/SOLR-1720
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Yonik Seeley
 Fix For: 1.5


 Jason reported problems with Multiple replicateAfter values - it worked after 
 changing to just commit
 http://www.lucidimagination.com/search/document/e4c9ba46dc03b031/replication_problem

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-07 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797898#action_12797898
]

Jason Rutherglen commented on SOLR-1709:

Tim,

Thanks for the patch...

bq. as I'm having a bit of trouble with svn (don't shoot me, but my environment
is a Redmond-based os company).

TortoiseSVN works well on Windows, even for creating patches. Have you tried
it?

Distributed Date Faceting
-

Key: SOLR-1709
URL: https://issues.apache.org/jira/browse/SOLR-1709
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor

This patch is for adding support for date facets when using distributed
searches.
Date faceting across multiple machines exposes some time-based issues that
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch
(i.e. merged date facets are at a time-of-day, not necessarily at a universal
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the
basis for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to
the first by 1 'gap', these 'earlier' or 'later' facets will not be merged
in.
There are several reasons for this:
* Performance: It's faster to check facet_date lists against a single map's
data, rather than against each other, particularly if there are many shards
* If 'earlier' and/or 'later' facet_dates are added in, this will make the
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3
or more hours of data)
This could be dealt with if timezone and skew information was added, and
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and
'now' parameters to the 'facet_dates' map. This would tell requesters what
time and TZ the remote server thinks it is, and so multiple shards' time data
can be normalized.
The patch affects 2 files in the Solr core:
org.apache.solr.handler.component.FacetComponent.java
org.apache.solr.handler.component.ResponseBuilder.java
The main changes are in FacetComponent - ResponseBuilder is just to hold the
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but
really, if facet.date parameters are specified, it is assumed they are
desired.
Comments suggestions welcome.
As a favour to ask, if anyone could take my 2 source files and create a PATCH
file from it, it would be greatly appreciated, as I'm having a bit of trouble
with svn (don't shoot me, but my environment is a Redmond-based os company).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-21 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793358#action_12793358
]

Jason Rutherglen commented on SOLR-1277:

{quote}Zookeeper gives us the layout of the cluster. It doesn't
seem like we need (yet) fast failure detection from zookeeper -
other nodes can do this synchronously themselves (and would need
to anyway) on things like connection failures. App-level
timeouts should not mark the node as failed since we don't know
how long the request was supposed to take.{quote}

Google Chubby when used in conjunction with search sets a high
timeout of 60 seconds I believe?

Fast failover is difficult so it'll be best to enable fast
re-requesting to adjacent slave servers on request failure.

Mahadev has some good advise about how we can separate the logic
into different znodes. Going further I think we'll want to allow
cores to register themselves, then listen to a separate
directory as to what state each should be in. We'll need to
insure the architecture allows for defining multiple tiers (like a pyramid).

At http://wiki.apache.org/solr/ZooKeeperIntegration is a node a
core or a server/corecontainer?

To move ahead we'll really need to define and settle on the
directory and file structure. I believe the requirement of
grouping cores so that one may issue a search against a group
name, instead of individual shard names will be useful. The
ability to move cores to different nodes will be necessary, as
is the ability to replicate cores (i.e. have multiple copies
available on different servers).

Today I deploy lots of cores today from HDFS across quite a few
servers containing 1.6 billion documents representing at least
2.4 TB of data. I mention this because a lot can potentially go
wrong in this type of setup (i.e. server's going down, corrupted
data, intermittent network, etc) I generate a file that contains
all the information as to which core should go to which Solr
server using size based balancing. Ideally I'd be able to
generate a new file, perhaps for load balancing the cores across
new Solr servers or to define that hot cores should be
replicated, and the Solr cluster would move the cores to the
defined servers automatically. This doesn't include the separate
set of servers system that handles incremental updates (i.e.
master - slave).

There's a bit of trepidation in moving forward on this because
we don't want to engineer ourselves into a hole, however if we
need to change the structure of the znodes in the future, we'll
need a healthy a versioning plan such that one may upgrade a
cluster while maintaining backwards compatibility on a live
system. Lets think of a basic plan for this.

In conclusion, lets iterate on the directory structure via the
wiki or this issue?

{quote}A search node can have very large caches tied to readers
that all drop at once on commit, and can require a much larger
heap to accommodate these caches. I think thats a more common
scenario that creates these longer pauses.{quote}

The large cache issue should be fixable with the various NRT
changes SOLR-1606. They're collectively not much different than
the search and sort per segment changes made to Lucene 2.9.

Implement a Solr specific naming service (using Zookeeper)
--

Key: SOLR-1277
URL: https://issues.apache.org/jira/browse/SOLR-1277
Project: Solr
Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.5

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

The goal is to give Solr server clusters self-healing attributes
where if a server fails, indexing and searching don't stop and
all of the partitions remain searchable. For configuration, the
ability to centrally deploy a new configuration without servers
going offline.
We can start with basic failover and start from there?
Features:
* Automatic failover (i.e. when a server fails, clients stop
trying to index to or search it)
* Centralized configuration management (i.e. new solrconfig.xml
or schema.xml propagates to a live Solr cluster)
* Optionally allow shards of a partition to be moved to another
server (i.e. if a server gets hot, move the hot segments out to
cooler servers). Ideally we'd have a way to detect hot segments
and move them seamlessly. With NRT this becomes somewhat more
difficult but not impossible?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1665) Add debugTimings param so that timings for components can be retrieved without having to do explains(), as in debugQuery

2009-12-21 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793474#action_12793474
 ] 

Jason Rutherglen commented on SOLR-1665:


Plus one, visibility into the components would be good.  This'll work for 
distributed processes (i.e. time taken on each node per component)?

 Add debugTimings param so that timings for components can be retrieved 
 without having to do explains(), as in debugQuery
 --

 Key: SOLR-1665
 URL: https://issues.apache.org/jira/browse/SOLR-1665
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 As the title says, it would be great if we could just get back component 
 timings w/o having to do the full boat of explains and other stuff.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader

2009-12-11 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789600#action_12789600
 ] 

Jason Rutherglen commented on SOLR-1506:


There's a different bug here, where because CoreContainer loads
the cores sequentially, and MultiCoreReaderFactory looks for all
the cores, when the proxy core isn't last, not all the cores are
searchable, if the proxy is first, an exception is thrown. 

The workaround is to place the proxy core last, however that's
not possible when using the core admin HTTP API. Hmm... Not sure
what the best workaround is.

 Search multiple cores using MultiReader
 ---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1506.patch, SOLR-1506.patch, SOLR-1506.patch


 I need to search over multiple cores, and SOLR-1477 is more
 complicated than expected, so here we'll create a MultiReader
 over the cores to allow searching on them.
 Maybe in the future we can add parallel searching however
 SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787619#action_12787619
 ] 

Jason Rutherglen commented on SOLR-1606:


{quote}In any case, I assume it must not fsync the files, so you
don't get a commit where you know your in a stable
condition?{quote}

OK, right, for the user commit currently means that after the
call, the index is in a stable state, and that it can be
replicated? I agree, for clarity, I'll create a refresh command
and remove the NRT option from the commit command.



 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787621#action_12787621
 ] 

Jason Rutherglen commented on SOLR-1606:


{quote}For example, q=foofreshness=1000 would cause a new realtime reader to 
be opened of the current one was more than 1000ms old.{quote}

Good idea.

 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787686#action_12787686
 ] 

Jason Rutherglen commented on SOLR-1606:


I was going to start on the auto-warming using IndexWriter's
IndexReaderWarmer, however because this is heavily cache
dependent I think it'll have to wait for SOLR-1308 because we
need to regenerate the cache per reader. 

 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787800#action_12787800
 ] 

Jason Rutherglen commented on SOLR-1606:


The current NRT IndexWriter.getReader API cannot yet support 
IndexReaderFactory, I'll open a Lucene issue.

 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-433) MultiCore and SpellChecker replication

2009-12-07 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787155#action_12787155
]

Jason Rutherglen commented on SOLR-433:
---

Are the existing patches for multiple cores or only for spellchecking?

MultiCore and SpellChecker replication
--

Key: SOLR-433
URL: https://issues.apache.org/jira/browse/SOLR-433
Project: Solr
Issue Type: Improvement
Components: replication (scripts), spellchecker
Affects Versions: 1.3
Reporter: Otis Gospodnetic
Fix For: 1.5

Attachments: RunExecutableListener.patch, SOLR-433-r698590.patch,
SOLR-433.patch, SOLR-433.patch, SOLR-433.patch, SOLR-433.patch,
solr-433.patch, SOLR-433_unified.patch, spellindexfix.patch

With MultiCore functionality coming along, it looks like we'll need to be
able to:
A) snapshot each core's index directory, and
B) replicate any and all cores' complete data directories, not just their
index directories.
Pulled from the spellchecker and multi-core index replication thread -
http://markmail.org/message/pj2rjzegifd6zm7m
Otis:
I think that makes sense - distribute everything for a given core, not just
its index. And the spellchecker could then also have its data dir (and only
index/ underneath really) and be replicated in the same fashion.
Right?
Ryan:
Yes, that was my thought. If an arbitrary directory could be distributed,
then you could have
/path/to/dist/index/...
/path/to/dist/spelling-index/...
/path/to/dist/foo
and that would all get put into a snapshot. This would also let you put
multiple cores within a single distribution:
/path/to/dist/core0/index/...
/path/to/dist/core0/spelling-index/...
/path/to/dist/core0/foo
/path/to/dist/core1/index/...
/path/to/dist/core1/spelling-index/...
/path/to/dist/core1/foo

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787206#action_12787206
 ] 

Jason Rutherglen commented on SOLR-1606:


Koji,

Looks like a change to trunk is causing the error, also when I step through it 
passes, when I run without stepping it fails...

 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787221#action_12787221
 ] 

Jason Rutherglen commented on SOLR-1606:


bq. Don't we need a new command, like update_realtime

We could however it'd work the same as commit?  Meaning afterwards, all pending 
changes (including deletes) are available?  The commit command is fairly 
overloaded as is.  Are you thinking in terms of replication?

 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1619) Cache documents by their internal ID

2009-12-04 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786233#action_12786233
 ] 

Jason Rutherglen commented on SOLR-1619:


Right, we'd somehow give the user either option.  

 Cache documents by their internal ID
 

 Key: SOLR-1619
 URL: https://issues.apache.org/jira/browse/SOLR-1619
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


 Currently documents are cached by their Lucene docid, however we can instead 
 cache them using their schema derived unique id.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1308) Cache docsets at the SegmentReader level

2009-12-04 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786240#action_12786240
]

Jason Rutherglen commented on SOLR-1308:

{quote} Yeah... that's a pain. We could easily do per-segment
faceting for non-string types though (int, long, etc) since they
don't need to be merged. {quote}

I opened SOLR-1617 for this. I think doc sets can be handled
with a multi doc set (hopefully). Facets however, argh,
FacetComponent is really hairy, though I think it boils down to
simply adding field values of the same up? Then there seems to
be edge cases which I'm scared of. At least it's easy to test
whether we're fulfilling todays functionality by randomly unit
testing per-segment and multi-segment side by side (i.e. if the
results of one are different than the results of the other, we
know there's something to fix).

Perhaps we can initially add up field values, and test that
(which is enough for my project), and move from there. I'd still
like to genericize all of the distributed processes to work over
multiple segments (like Lucene distributed search uses a
MultiSearcher which also works locally), so that local or
distributed is the same API wise. However given I've had trouble
figuring out the existing distributed code (SOLR-1477 ran into a
wall). Maybe as part of SolrCloud
http://wiki.apache.org/solr/SolrCloud, we can rework the
distributed APIs to be more user friendly (i.e. *MultiSearcher
is really easy to understand). If Solr's going to work well in
the cloud, distributed search probably needs to be easy to multi
tier for scaling (i.e. if we have 1 proxy server and 100 nodes,
we could have 1 top proxy, and 1 proxy per 10 nodes, etc).

Cache docsets at the SegmentReader level

Key: SOLR-1308
URL: https://issues.apache.org/jira/browse/SOLR-1308
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
Fix For: 1.5

Original Estimate: 504h
Remaining Estimate: 504h

Solr caches docsets at the top level Multi*Reader level. After a
commit, the filter/docset caches are flushed. Reloading the
cache in near realtime (i.e. commits every 1s - 2min)
unnecessarily consumes IO resources when reloading the filters,
especially for largish indexes.
We'll cache docsets at the SegmentReader level. The cache key
will include the reader.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1308) Cache docsets and docs at the SegmentReader level

2009-12-03 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785433#action_12785433
 ] 

Jason Rutherglen commented on SOLR-1308:


I realized because of UnInvertedField, we'll need to merge facet
results from UIF per reader, so using a MultiDocSet won't help. Can we
leverage the distributed merging FacetComponent implements
(i.e. reuse and/or change the code to work in both the
distributed and local cases)? Ah well, I was hoping for an easy
solution for realtime facets. 

 Cache docsets and docs at the SegmentReader level
 -

 Key: SOLR-1308
 URL: https://issues.apache.org/jira/browse/SOLR-1308
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

   Original Estimate: 504h
  Remaining Estimate: 504h

 Solr caches docsets and documents at the top level Multi*Reader
 level. After a commit, the caches are flushed. Reloading the
 caches in near realtime (i.e. commits every 1s - 2min)
 unnecessarily consumes IO resources, especially for largish
 indexes.
 We can cache docsets and documents at the SegmentReader level.
 The cache settings in SolrConfig can be applied to the
 individual SR caches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1308) Cache docsets at the SegmentReader level

2009-12-03 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated SOLR-1308:
---

Description:
Solr caches docsets at the top level Multi*Reader level. After a
commit, the filter/docset caches are flushed. Reloading the
cache in near realtime (i.e. commits every 1s - 2min)
unnecessarily consumes IO resources when reloading the filters,
especially for largish indexes.

We'll cache docsets at the SegmentReader level. The cache key
will include the reader.

was:
Solr caches docsets and documents at the top level Multi*Reader
level. After a commit, the caches are flushed. Reloading the
caches in near realtime (i.e. commits every 1s - 2min)
unnecessarily consumes IO resources, especially for largish
indexes.

We can cache docsets and documents at the SegmentReader level.
The cache settings in SolrConfig can be applied to the
individual SR caches.

Summary: Cache docsets at the SegmentReader level (was: Cache docsets
and docs at the SegmentReader level)

I changed the title because we're not going to cache docs in
this issue (though I think it's possible to cache docs by the
internal id, rather than the doc id).

Per-segment facet caching and merging per segment can go into a
different issue.

Cache docsets at the SegmentReader level

Key: SOLR-1308
URL: https://issues.apache.org/jira/browse/SOLR-1308
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
Fix For: 1.5

Original Estimate: 504h
Remaining Estimate: 504h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1617) Cache and merge facets per segment

2009-12-03 Thread Jason Rutherglen (JIRA)

Cache and merge facets per segment
--

 Key: SOLR-1617
 URL: https://issues.apache.org/jira/browse/SOLR-1617
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


Spinoff from SOLR-1308.  We'll enable per-segment facet caching and merging 
which will allow near realtime faceted searching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1618) Merge docsets on segment merge

2009-12-03 Thread Jason Rutherglen (JIRA)

Merge docsets on segment merge
--

 Key: SOLR-1618
 URL: https://issues.apache.org/jira/browse/SOLR-1618
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


When SOLR-1308 is implemented, we can save some time when creating new docsets 
by merging them in RAM as segments are merged (similar to LUCENE-1785)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1619) Cache documents by their internal ID

2009-12-03 Thread Jason Rutherglen (JIRA)

Cache documents by their internal ID


 Key: SOLR-1619
 URL: https://issues.apache.org/jira/browse/SOLR-1619
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


Currently documents are cached by their Lucene docid, however we can instead 
cache them using their schema derived unique id.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-02 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784973#action_12784973
]

Jason Rutherglen commented on SOLR-1277:

If we're detecting node failure, it seems the functionality of
Solr should also be detected for failure. The discussions thus
far seem to be around network or process failure which is
usually either intermittent or terminal. Detecting measurable
increase/decreases in CPU, RAM consumption, OOMs, query
failures, indexing failures due to bugs are probably more important than the
network being down because they are harder to detect and fix.

How is HBase handling the detection of functional issues in
relation to ZK?

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-02 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785014#action_12785014
]

Jason Rutherglen commented on SOLR-1277:

bq. The question then becomes what do you want to make automatic
vs those things that require operator intervention.

Right, I'd like the distributed Solr + ZK system to
automatically failover to another server if there's a functional
software failure. Also, with a search system query times are
very important and if they suddenly drop off on a replicated
server, the node needs to be removed and a new server brought
online (hopefully automatically). If Solr + ZK doesn't take out
a server whose query times are 10 times the average of the other
comparable replicated slave servers, then it 's harder to
justify going live with it, in my humble opinion because it's
not really solving the main reason to use a naming service.

While this may not be functionality we need in an initial
release, it's important to insure our initial design does not
limit future functionality.

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1308) Cache docsets and docs at the SegmentReader level

2009-12-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784668#action_12784668
]

Jason Rutherglen commented on SOLR-1308:

I'm taking a look at this, it's straightforward to cache and
reuse docsets per reader in SolrIndexSearcher, however, we're
passing docsets all over the place (i.e. UnInvertedField). We
can't exactly rip out DocSet without breaking most unit tests,
and writing a bunch of facet merging code. We'd likely lose
functionality?

Will the MultiDocSet concept SOLR-568 as an easy way to get
something that works up and running? Then we can benchmark and
see if we've lost performance?

Cache docsets and docs at the SegmentReader level
-

Key: SOLR-1308
URL: https://issues.apache.org/jira/browse/SOLR-1308
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
Fix For: 1.5

Original Estimate: 504h
Remaining Estimate: 504h

Solr caches docsets and documents at the top level Multi*Reader
level. After a commit, the caches are flushed. Reloading the
caches in near realtime (i.e. commits every 1s - 2min)
unnecessarily consumes IO resources, especially for largish
indexes.
We can cache docsets and documents at the SegmentReader level.
The cache settings in SolrConfig can be applied to the
individual SR caches.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1614) Search in Hadoop

2009-11-30 Thread Jason Rutherglen (JIRA)

Search in Hadoop


 Key: SOLR-1614
 URL: https://issues.apache.org/jira/browse/SOLR-1614
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


What's the use case? Sometimes queries are expensive (such as
regex) or one has indexes located in HDFS, that then need to be
searched on. By leveraging Hadoop, these non-time sensitive
queries may be executed without dynamically deploying the
indexes to new Solr servers. 

We'll download the index out of HDFS (assuming they're zipped),
perform the queries in a batch on the index shard, then merge
the results either using a Solr query results priority queue, or
simply using Hadoop's built in merge sorting. 

The query file will be encoded in JSON format, (ID, query,
numresults,fields). The shards file will simply contain newline
delimited paths (HDFS or otherwise). The output can be a Solr
encoded results file per query.

I'm hoping to add an actual Hadoop unit test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1609) Create a cache implementation that limits itself to a given RAM size

2009-11-29 Thread Jason Rutherglen (JIRA)

Create a cache implementation that limits itself to a given RAM size


 Key: SOLR-1609
 URL: https://issues.apache.org/jira/browse/SOLR-1609
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


This is a spinoff from the unrelated SOLR-1308. We can limit the
cache sizes by estimated RAM usage. I think in some cases this
is a better approach when compared with using soft references as
this will effectively limit the cache RAM used. Soft references
will utilize the max heap before divesting itself of excessive
cached items, which in some cases may not be the desired
behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1610) Add generics to SolrCache

2009-11-29 Thread Jason Rutherglen (JIRA)

Add generics to SolrCache
-

 Key: SOLR-1610
 URL: https://issues.apache.org/jira/browse/SOLR-1610
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5


Seems fairly simple for SolrCache to have generics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1610) Add generics to SolrCache

2009-11-29 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1610:
---

Attachment: SOLR-1610.patch

Compiles, ran some of the unit tests.  Not sure what else needs to be done?

 Add generics to SolrCache
 -

 Key: SOLR-1610
 URL: https://issues.apache.org/jira/browse/SOLR-1610
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1610.patch


 Seems fairly simple for SolrCache to have generics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1606) Integrate Near Realtime

2009-11-28 Thread Jason Rutherglen (JIRA)

Integrate Near Realtime 


 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1606) Integrate Near Realtime

2009-11-28 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1606:
---

Attachment: SOLR-1606.patch

Solr config can have an index nrt (true|false), or commit can specify the nrt 
var.  With nrt=true, when creating a new searcher we call getReader.  

 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1578) Develop a Spatial Query Parser

2009-11-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780184#action_12780184
 ] 

Jason Rutherglen commented on SOLR-1578:


GBase http://code.google.com/apis/base/docs/2.0/query-lang-spec.html (Locations 
section at the bottom of the page) has a query syntax for spatial queries (i.e. 
@+40.75-074.00 + 5mi)

 Develop a Spatial Query Parser
 --

 Key: SOLR-1578
 URL: https://issues.apache.org/jira/browse/SOLR-1578
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 1.5


 Given all the work around spatial, it would be beneficial if Solr had a query 
 parser for dealing with spatial queries.  For starters, something that used 
 geonames data or maybe even Google Maps API would be really useful.  Longer 
 term, a spatial grammar that can robustly handle all the vagaries of 
 addresses, etc. would be really cool.
 Refs: 
 [1] http://www.geonames.org/export/client-libraries.html (note the Java 
 client is ASL)
 [2] Data from geo names: http://download.geonames.org/export/dump/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1506) Search multiple cores using MultiReader

2009-11-09 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1506:
---

Attachment: SOLR-1506.patch

MultiReader doesn't support reopen with the readOnly parameter.  This patch 
adds a test case for commit on the proxy, and a workaround (if unsupported is 
caught, then regular reopen is called).

 Search multiple cores using MultiReader
 ---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1506.patch, SOLR-1506.patch, SOLR-1506.patch


 I need to search over multiple cores, and SOLR-1477 is more
 complicated than expected, so here we'll create a MultiReader
 over the cores to allow searching on them.
 Maybe in the future we can add parallel searching however
 SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader

2009-11-03 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773104#action_12773104
 ] 

Jason Rutherglen commented on SOLR-1506:


Commit doesn't work because reopen isn't supported by MultiReader.

 Search multiple cores using MultiReader
 ---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1506.patch, SOLR-1506.patch


 I need to search over multiple cores, and SOLR-1477 is more
 complicated than expected, so here we'll create a MultiReader
 over the cores to allow searching on them.
 Maybe in the future we can add parallel searching however
 SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader

2009-11-02 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772793#action_12772793
 ] 

Jason Rutherglen commented on SOLR-1506:


There's a bug here with getting the status of multiple cores:

SEVERE: org.apache.solr.common.SolrException: Error handling 'status' action 
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:362)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:131)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.UnsupportedOperationException: This reader does not 
support this method.
at org.apache.lucene.index.IndexReader.directory(IndexReader.java:592)
at 
org.apache.solr.search.SolrIndexReader.directory(SolrIndexReader.java:222)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:442)
at 
org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:449)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:353


 Search multiple cores using MultiReader
 ---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1506.patch, SOLR-1506.patch


 I need to search over multiple cores, and SOLR-1477 is more
 complicated than expected, so here we'll create a MultiReader
 over the cores to allow searching on them.
 Maybe in the future we can add parallel searching however
 SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1395) Integrate Katta

2009-10-29 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771704#action_12771704
 ] 

Jason Rutherglen commented on SOLR-1395:


Pravin,

I'll review the test case when I can.  Did you download and apply the latest 
patch?  

 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
 katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, 
 solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431.patch, 
 SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
 test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1477) Search on multi-tier cores

2009-10-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767594#action_12767594
 ] 

Jason Rutherglen commented on SOLR-1477:


The use case is scaling to hundreds of servers where a single distributed 
search proxy server becomes a bottleneck, or simply querying multiple local 
cores.  Either way the same muti-tiered distributed search module will be 
highly effective.

 Search on multi-tier cores
 --

 Key: SOLR-1477
 URL: https://issues.apache.org/jira/browse/SOLR-1477
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, 
 SOLR-1477.patch, SOLR-1477.patch


 Search on cores in the container, using distributed search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1477) Search on multi-tier cores

2009-10-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767600#action_12767600
 ] 

Jason Rutherglen commented on SOLR-1477:


The way the process should work for this patch is:

1) Incoming query to shard proxy server
2) getids passed to N intermediary proxy servers
3) Intermediary proxy servers forwards the getids call to Y Solr servers
4) Y Solr servers respond, i-proxy merges the ids, and sends the response to 
the toplevel proxy server from step 1)
5) The toplevel proxy merges the results from the i-proxies
6) getdocs is passed from proxy 1) to the i-proxies
7) i-proxies call Solr servers to obtain documents (the actual shard the 
documents exist on needs to be passed to the i-proxy to avoid redundancy)
8) iproxies send the results of getdocs to the toplevel proxy
9) The request is completed.

I know that's muddy but it's a start.

 Search on multi-tier cores
 --

 Key: SOLR-1477
 URL: https://issues.apache.org/jira/browse/SOLR-1477
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, 
 SOLR-1477.patch, SOLR-1477.patch


 Search on cores in the container, using distributed search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1301) Solr + Hadoop

2009-10-19 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated SOLR-1301:
---

Attachment: SOLR-1301.patch

Here's an update that includes the change Jason mentioned above
(needHeartBeat in SRW.close). I've run this patch in production,
however I was unable to turn off logging due to complexities
with SLF4J layering Hadoop where I could not turn off the Solr
update logs. I had to comment out the logging lines in Solr to
insure the Hadoop logs did not fill up.

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1506) Search multiple cores using MultiReader

2009-10-18 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1506:
---

Attachment: SOLR-1506.patch

Fixes a bug, added Apache headers

 Search multiple cores using MultiReader
 ---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1506.patch, SOLR-1506.patch


 I need to search over multiple cores, and SOLR-1477 is more
 complicated than expected, so here we'll create a MultiReader
 over the cores to allow searching on them.
 Maybe in the future we can add parallel searching however
 SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1477) Search on multi-tier cores

2009-10-18 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1477:
---

Priority: Minor  (was: Trivial)
 Summary: Search on multi-tier cores  (was: Search on local cores)

 Search on multi-tier cores
 --

 Key: SOLR-1477
 URL: https://issues.apache.org/jira/browse/SOLR-1477
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, 
 SOLR-1477.patch, SOLR-1477.patch


 Search on cores in the container, using distributed search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766641#action_12766641
 ] 

Jason Rutherglen commented on SOLR-1513:


Noble, before implementing, I was wondering if there's performance testing code 
for ConcurrentLRUCache in case Google Col somehow slows things down?

 Use Google Collections in ConcurrentLRUCache
 

 Key: SOLR-1513
 URL: https://issues.apache.org/jira/browse/SOLR-1513
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


 ConcurrentHashMap is used in ConcurrentLRUCache.  The Google Colletions 
 concurrent map implementation allows for soft values that are great for 
 caches that potentially exceed the allocated heap.  Though I suppose Solr 
 caches usually don't use too much RAM?
 http://code.google.com/p/google-collections/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-16 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1513:
---

Attachment: google-collect-snapshot.jar
SOLR-1513.patch

Here's a basic implementation, it needs testing for performance
and what happens if a value is removed before a key (in which
case the map could return null?). There are a number of
configurable params so we'll add those as options for solrconfig.



 Use Google Collections in ConcurrentLRUCache
 

 Key: SOLR-1513
 URL: https://issues.apache.org/jira/browse/SOLR-1513
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: google-collect-snapshot.jar, SOLR-1513.patch


 ConcurrentHashMap is used in ConcurrentLRUCache.  The Google Colletions 
 concurrent map implementation allows for soft values that are great for 
 caches that potentially exceed the allocated heap.  Though I suppose Solr 
 caches usually don't use too much RAM?
 http://code.google.com/p/google-collections/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-15 Thread Jason Rutherglen (JIRA)

Use Google Collections in ConcurrentLRUCache


 Key: SOLR-1513
 URL: https://issues.apache.org/jira/browse/SOLR-1513
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


ConcurrentHashMap is used in ConcurrentLRUCache.  The Google Colletions 
concurrent map implementation allows for soft values that are great for caches 
that potentially exceed the allocated heap.  Though I suppose Solr caches 
usually don't use too much RAM?

http://code.google.com/p/google-collections/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-15 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766356#action_12766356
 ] 

Jason Rutherglen commented on SOLR-1513:


I've tuned down my caches to not deal with OOMs and swapping.  I'd rather the 
cache simply remove values before swapping or OOMs.  

I think it would simply be an option, which I'd personally always have on! 

 Use Google Collections in ConcurrentLRUCache
 

 Key: SOLR-1513
 URL: https://issues.apache.org/jira/browse/SOLR-1513
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


 ConcurrentHashMap is used in ConcurrentLRUCache.  The Google Colletions 
 concurrent map implementation allows for soft values that are great for 
 caches that potentially exceed the allocated heap.  Though I suppose Solr 
 caches usually don't use too much RAM?
 http://code.google.com/p/google-collections/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1301) Solr + Hadoop

2009-10-12 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764745#action_12764745
]

Jason Rutherglen commented on SOLR-1301:

Thanks for the update Jason. It runs great, I've generated over a terabyte of
indexes using the patch. Now I'm trying to deploy them, and that's harder!

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 1.5

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1506) Search multiple cores using MultiReader

2009-10-11 Thread Jason Rutherglen (JIRA)

Search multiple cores using MultiReader
---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5


I need to search over multiple cores, and SOLR-1477 is more
complicated than expected, so here we'll create a MultiReader
over the cores to allow searching on them.

Maybe in the future we can add parallel searching however
SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1506) Search multiple cores using MultiReader

2009-10-11 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1506:
---

Attachment: SOLR-1506.patch

Well, it seems to work, though I had to comment out the reader.directory() call 
in SolrCore.  I'm not sure what to do there yet, but this is good enough for 
now.  

 Search multiple cores using MultiReader
 ---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1506.patch


 I need to search over multiple cores, and SOLR-1477 is more
 complicated than expected, so here we'll create a MultiReader
 over the cores to allow searching on them.
 Maybe in the future we can add parallel searching however
 SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1502) Add form to perform updates

2009-10-09 Thread Jason Rutherglen (JIRA)

Add form to perform updates
---

 Key: SOLR-1502
 URL: https://issues.apache.org/jira/browse/SOLR-1502
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


A convenience UI to perform updates via the Web UI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

1 2 3 >

1 - 100 of 263 matches

Mail list logo