On Monday, September 20, 2010, Bennie Schut <bsc...@ebuddy.com> wrote: > > > > > > > > > > > > > > Hi all, > > > > We are sometimes getting file not found exceptions while > running large queries on hive. During these large queries we also import data > on the partitions we are querying which raises a question for us. How does > hive > handle data which is being modified in the background? > > We use insert overwrite on the partitions so I can imagine > the large query can be surprised with some new files and some missing old > files. > > If others are experiencing this how do they work around > this? Perhaps partition on 2 keys so you don’t overwrite existing data? > > > > Thanks for any pointers on this. > > Bennie. > > > > > > >
I do think hive/map reduce have a great way of dealing with moving targets. if the content changes between get splits and task execution. We have a program file crush which crushes up small files. We implemented read and write locks on a table basis. I am sure the new zk locking might handles this better.