[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743168#comment-13743168
 ] 

Patrick Hunt commented on SOLR-5150:
------------------------------------

Hi [~markrmil...@gmail.com], thanks for filing this while I was out. I was 
trying to track down another issue and happened across it while reviewing code 
(then noticed that blur had changed from the original).

I realized the seekInternal change while on vacation, was going to mention that 
but it looks like you fixed it already. ;-)

I reviewed the HDFS client code for readInternal with a member of our HDFS team 
before generating the original patch. Based on the feedback I got the 
understood was that doing the seek followed by the readFully should have been 
highest performance. It's interesting that the query performance was so 
negatively impacted. We should followup with those folks again, perhaps you 
could provide more insight (than I) into how lucene accesses the underlying 
filesystem for query based reads vs other access patterns? Might help get more 
insight from the HDFS devs. Perhaps there is some way to trace those accesses...

We have not yet tried "short circuit local HDFS client reads" (see 12.11.2 here 
http://hbase.apache.org/book/perf.hdfs.html) but we should at some point (soon) 
and that will further complicate things. Based on the results other clients 
have seen we should see significant performance benefits from that (at least 
when the blocks are indeed local).

                
> HdfsIndexInput may not fully read requested bytes.
> --------------------------------------------------
>
>                 Key: SOLR-5150
>                 URL: https://issues.apache.org/jira/browse/SOLR-5150
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.4
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>             Fix For: 4.5, 5.0
>
>         Attachments: SOLR-5150.patch
>
>
> Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
> the read call we are using may not read all of the requested bytes - it 
> returns the number of bytes actually written - which we ignore.
> Blur moved to using a seek and then readFully call - synchronizing across the 
> two calls to deal with clones.
> We have seen that really kills performance, and using the readFully call that 
> lets you pass the position rather than first doing a seek, performs much 
> better and does not require the synchronization.
> I also noticed that the seekInternal impl should not seek but be a no op 
> since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to