Migration to 0.20.1 issue

2009-12-12 Thread Songting Chen
Hi,
  I encountered some weird problem when trying to migrate to HaDoop 0.20.1. 
  I have some old Map/Reduce jobs (with old APIs) that do compile and run on 
the new cluster. But they return no result.
  On a closer look, the configuration() and close() functions of the Mapper 
class are called during the map phase. But the map() function is not. That's 
why it outputs 0 record.

Any idea what's going on?
Thanks,
-Songting


JobTracker hangs after 400-500 jobs

2009-07-06 Thread Songting Chen

No response from the HaDoop cluster then - stop/start map/reduce would solve 
the problem.

Note: HDFS has no such issue.
Is it a common problem (we use v.19)?

Thanks,
-Songting


HDFS inconsistency issue

2013-01-02 Thread Songting Chen
We are hitting a weird HDFS issue after a good number of Hadoop nodes 
simultaneously crashed.

The problem is that after all the down servers came back, the HDFS complained 
there was 1 missing block.
But the file that that block belongs to has already been deleted after the 
crash. So it's an orphan block.
Because the block doesn't belong to any file, there is no way to delete that 
block. FSCK also failed with a 

cause:java.io.IOException: Premature EOF exception.

A side effect now is that HDFS won't free up any space even after Trash bin was 
emptied. The space utilization 

just constantly went up.

Any suggestion on how to resolve this issue is highly appreciated!

Thanks,
-Songting



Data Platform Engineer at Turn Inc.

2011-01-07 Thread Songting Chen
Data Platform Engineer at Turn Inc.

If you're passionate about large-scale distributed systems, petabyte data 
warehouses, NoSQL key-value stores, high throughput
real-time reporting system and are interested in joining a world class 
engineering team you might well be the person we're 
looking for. This hands-on role contributes to the organization's success 
through expertise in large scale MapReduce systems, 
advanced database programming and database architecture. 

We are a small team but build innovative and powerful systems. Our results were 
published in top-tier conferences 
such as VLDB (http://www.vldb2010.org/proceedings/files/papers/I08.pdf). 

If any of the following areas interest you, please send your resume to 
j...@turn.com
* Hybrid MapReduce/Database system 
* Distributed and parallel computing
* Performance tuning and optimization
* Large scale semi-structured data store
* Real time reporting system in 24 x 7 environment
* Turning research results into enterprise class software 

About Turn Inc.

Turn was founded to bring the efficiencies of search to digital advertising and 
empower the world's premier advertising 
agencies and brands to reach custom audiences at scale. We are a software and 
services company with the industry's only 
end-to-end platform for delivering the most effective data-driven digital 
advertising in the world. Our technology 
infrastructure, self-service interface, optimization algorithms, real-time 
analytics, and interoperability represent 
the future of media and data management. The company is based in Silicon Valley 
with locations in New York City, 
Charlotte, Chicago, London, Los Angeles, and San Francisco.

We are a rapidly growing, well funded startup in Redwood City, CA, with a 
growing business, a working business model 
and a seasoned executive team. We're changing the way the world thinks about 
online advertising and we are looking 
for talented engineers to help us take it to the next level.