Afternoon (here anyway),
I think understanding Solr's overall approach (whose design I believe came out of the thread you've referenced) is also a good step here. Even if you can't re-use the hard links trick, you might be able to reuse its snapshotting & index distribution protocol.
I'll have take a better look into Solr, I was just peripherally aware of its existance, thanks for the reminder. However, I have been working on some "bottoms up" improvements to
Lucene (getting native OS locking working and [separate but related] "lock-less commits") that I think could be related to some of the issues you're seeing with HDFS -- see below: Right, with "lock-less commits" patch we never rename a file and also never re-use a file name (ie, making Lucene's use of the filesystem "write once").
This is very interesting! I don't know enough about HDFS (yet!). On very quick read, I like that it's a "write once" filesystem because it's a good match to lock-less commits.
It also takes care of redundancy, and in extremely large systems can help keep costs down. (At least by my math) It's a really cool thing. That exception looks disturbingly similar to the ones Lucene hits on
NFS. See here for gory details: http://issues.apache.org/jira/browse/LUCENE-673 I think even if lock-less commits ("write once") enables sharing of a single copy of index over remote filesystems like HDFS or NFS or SMB/CIFS, whether or not that's performant enough (vs replicating copies to local filesystems that are presumably quite a bit faster at IO, at the expense of local storage consumed) would still be a big open question.
I wondered if it was similar to the NFS problems, but I don't know enough about the underlying hadoop filesystem implementation to determine if that was the culprit. I worked late last night and came up with what I think is a reasonable pure-java solution. Basically I didn't want to assume that I know where the index is being stored, from both the indexer's and the searcher's point of view I was hoping they could just see the abstract Directory. I take advantage of the way lucene stores the index by only copying over files that have been modified, as the index gets bigger this can be tuned to be relatively painless. synchronized(Node.SEGMENTLOCK){ //Don't let anyone else in the process mess with it. byte[] buf = new byte[BYTECOUNT]; try { localDir = FSDirectory.getDirectory("/tmp/index", true); String[] files = mainDir.list(); for(int i = 0; i < files.length; i++){ if(localDir.fileExists(files[i]) && mainDir.fileModified(files[i])
thelocaldir.fileModified(files[i])){
IndexOutput io = localDir.createOutput(files[i]); IndexInput ii = hdfsDir.openInput(files[i]); while(ii.getFilePointer() < ii.length()){ int eating = Math.min(BYTECOUNT, (int)ii.length() - (int)ii.getFilePointer()); ii.readBytes(buf, 0, eating); io.writeBytes(buf, eating); } ii.close(); io.flush(); io.close(); } } } catch (Exception e){ e.printStackTrace(); return; } } is = new IndexSearcher(localDir); The only problem here is that I have to have the searchers and indexers running in the same process to lock the indexer out from indexing. For now that's alright since there are few searches going on while the indexing is happening. I'm considering using JMS or some other message system to do the locking and allow me to split the processes. This is similar to solution 3 in my previous post. I'll have to keep an eye out for lockless commits, and if you want I can try the patch and see if it fixes the HDFS problems. Cheers, Chris