Re: HDFS - millions of files in one directory?

2009-01-23 Thread Philip (flip) Kromer
I ran in this problem, hard, and I can vouch that this is not a windows-only
problem. ReiserFS, ext3 and OSX's HFS+ become cripplingly slow with more
than a few hundred thousand files in the same directory. (The operation to
correct this mistake took a week to run.)  That is one of several hard
lessons I learned about "don't write your scraper to replicate the path
structure of each document as a file on disk."

Cascading the directory structure works, but sucks in various other ways,
and itself stops scaling after a while.  What I eventually realized is that
I was using the filesystem as a particularly wrongheaded document database,
and that the metadata delivery of a filesystem just doesn't work for this.

Since in our application the files are text and are immutable, our adhoc
solution is to encode and serialize each file with all its metadata, one per
line, into a flat file.

A distributed database is probably the correct answer, but this is working
quite well for now and even has some advantages. (No-cost replication from
work to home or offline by rsync or thumb drive, for example.)

flip

On Fri, Jan 23, 2009 at 5:49 PM, Raghu Angadi  wrote:

> Mark Kerzner wrote:
>
>> But it would seem then that making a balanced directory tree would not
>> help
>> either - because there would be another binary search, correct? I assume,
>> either way it would be as fast as can be :)
>>
>
> But the cost of memory copies would be much less with a tree (when you add
> and delete files).
>
> Raghu.
>
>
>
>>
>> On Fri, Jan 23, 2009 at 5:08 PM, Raghu Angadi 
>> wrote:
>>
>>  If you are adding and deleting files in the directory, you might notice
>>> CPU
>>> penalty (for many loads, higher CPU on NN is not an issue). This is
>>> mainly
>>> because HDFS does a binary search on files in a directory each time it
>>> inserts a new file.
>>>
>>> If the directory is relatively idle, then there is no penalty.
>>>
>>> Raghu.
>>>
>>>
>>> Mark Kerzner wrote:
>>>
>>>  Hi,

 there is a performance penalty in Windows (pardon the expression) if you
 put
 too many files in the same directory. The OS becomes very slow, stops
 seeing
 them, and lies about their status to my Java requests. I do not know if
 this
 is also a problem in Linux, but in HDFS - do I need to balance a
 directory
 tree if I want to store millions of files, or can I put them all in the
 same
 directory?

 Thank you,
 Mark



>>
>


-- 
http://www.infochimps.org
Connected Open Free Data


Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi

Mark Kerzner wrote:

But it would seem then that making a balanced directory tree would not help
either - because there would be another binary search, correct? I assume,
either way it would be as fast as can be :)


But the cost of memory copies would be much less with a tree (when you 
add and delete files).


Raghu.




On Fri, Jan 23, 2009 at 5:08 PM, Raghu Angadi  wrote:


If you are adding and deleting files in the directory, you might notice CPU
penalty (for many loads, higher CPU on NN is not an issue). This is mainly
because HDFS does a binary search on files in a directory each time it
inserts a new file.

If the directory is relatively idle, then there is no penalty.

Raghu.


Mark Kerzner wrote:


Hi,

there is a performance penalty in Windows (pardon the expression) if you
put
too many files in the same directory. The OS becomes very slow, stops
seeing
them, and lies about their status to my Java requests. I do not know if
this
is also a problem in Linux, but in HDFS - do I need to balance a directory
tree if I want to store millions of files, or can I put them all in the
same
directory?

Thank you,
Mark








Re: HDFS - millions of files in one directory?

2009-01-23 Thread Mark Kerzner
But it would seem then that making a balanced directory tree would not help
either - because there would be another binary search, correct? I assume,
either way it would be as fast as can be :)



On Fri, Jan 23, 2009 at 5:08 PM, Raghu Angadi  wrote:

>
> If you are adding and deleting files in the directory, you might notice CPU
> penalty (for many loads, higher CPU on NN is not an issue). This is mainly
> because HDFS does a binary search on files in a directory each time it
> inserts a new file.
>
> If the directory is relatively idle, then there is no penalty.
>
> Raghu.
>
>
> Mark Kerzner wrote:
>
>> Hi,
>>
>> there is a performance penalty in Windows (pardon the expression) if you
>> put
>> too many files in the same directory. The OS becomes very slow, stops
>> seeing
>> them, and lies about their status to my Java requests. I do not know if
>> this
>> is also a problem in Linux, but in HDFS - do I need to balance a directory
>> tree if I want to store millions of files, or can I put them all in the
>> same
>> directory?
>>
>> Thank you,
>> Mark
>>
>>
>


Where are the meta data on HDFS ?

2009-01-23 Thread tienduc_dinh

hi everyone,

I got a question, maybe you can help me.

- how can we get the meta data of a file on HDFS ? 

For example:  If I have a file with e.g. 2 GB on HDFS, this file is split
into many chunks and these chunks are distributed on many nodes. Is there
any trick to know, which chunks belong to that file ?

Any help will be appreciated, thanks lots.

Tien Duc Dinh
-- 
View this message in context: 
http://www.nabble.com/Where-are-the-meta-data-on-HDFS---tp21634677p21634677.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: hadoop balanceing data

2009-01-23 Thread Hairong Kuang
%Remaining is much more fluctuate than %dfs used. This is because dfs shares
the disks with mapred and mapred tasks may use a lot of disks temporally. So
trying to keep the same %free is impossible most of the time.

Hairong


On 1/19/09 10:28 PM, "Billy Pearson"  wrote:

> Why do we not use the Remaining % in place of use Used % when we are
> selecting datanode for new data and when running the balancer.
> form what I can tell we are using the use % used and we do not factor in non
> DFS Used at all.
> I see a datanode with only a 60GB hard drive fill up completely 100% before
> the other servers that have 130+GB hard drives get half full.
> Seams like Trying to keep the same % free on the drives in the cluster would
> be more optimal in production.
> I know this still may not be perfect but would be nice if we tried.
> 
> Billy
> 
> 



Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi

Raghu Angadi wrote:


If you are adding and deleting files in the directory, you might notice 
CPU penalty (for many loads, higher CPU on NN is not an issue). This is 
mainly because HDFS does a binary search on files in a directory each 
time it inserts a new file.


I should add that equal or even bigger cost is the memmove that 
ArrayList does when you add or delete entries.


ArrayList, rather than a map is used mainly to save memory, them most 
precious resource for NameNode.


Raghu.


If the directory is relatively idle, then there is no penalty.

Raghu.

Mark Kerzner wrote:

Hi,

there is a performance penalty in Windows (pardon the expression) if 
you put
too many files in the same directory. The OS becomes very slow, stops 
seeing
them, and lies about their status to my Java requests. I do not know 
if this
is also a problem in Linux, but in HDFS - do I need to balance a 
directory
tree if I want to store millions of files, or can I put them all in 
the same

directory?

Thank you,
Mark







Re: HDFS - millions of files in one directory?

2009-01-23 Thread Mark V
On Sat, Jan 24, 2009 at 10:03 AM, Mark Kerzner  wrote:
> Hi,
>
> there is a performance penalty in Windows (pardon the expression) if you put
> too many files in the same directory. The OS becomes very slow, stops seeing
> them, and lies about their status to my Java requests. I do not know if this
> is also a problem in Linux, but in HDFS - do I need to balance a directory
> tree if I want to store millions of files, or can I put them all in the same
> directory?
>
>From my old windows days...
There is a registry setting to turn off some feature where by Windows
keeps a mapping of 8.3 filenames to the full filenames - can't recall
it exactly but it is worth looking for.
Also try name your files so that the 'uniuqe' part of the filename
comes first, e.g. 123_inventoryid.ext is 'better' than
inventoryid_123.ext

HTH
Mark

> Thank you,
> Mark
>


Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi


If you are adding and deleting files in the directory, you might notice 
CPU penalty (for many loads, higher CPU on NN is not an issue). This is 
mainly because HDFS does a binary search on files in a directory each 
time it inserts a new file.


If the directory is relatively idle, then there is no penalty.

Raghu.

Mark Kerzner wrote:

Hi,

there is a performance penalty in Windows (pardon the expression) if you put
too many files in the same directory. The OS becomes very slow, stops seeing
them, and lies about their status to my Java requests. I do not know if this
is also a problem in Linux, but in HDFS - do I need to balance a directory
tree if I want to store millions of files, or can I put them all in the same
directory?

Thank you,
Mark





HDFS - millions of files in one directory?

2009-01-23 Thread Mark Kerzner
Hi,

there is a performance penalty in Windows (pardon the expression) if you put
too many files in the same directory. The OS becomes very slow, stops seeing
them, and lies about their status to my Java requests. I do not know if this
is also a problem in Linux, but in HDFS - do I need to balance a directory
tree if I want to store millions of files, or can I put them all in the same
directory?

Thank you,
Mark


Re: hadoop consulting?

2009-01-23 Thread Christophe Bisciglia
Thanks Mark. I'll be getting in touch early next week.

Others, I see replies default strait to the list. Please feel free to
email just me (christo...@cloudera.com), unless, well, you're in the
mood to share you bio with everyone :-)

Cheers,
Christophe

On Fri, Jan 23, 2009 at 2:31 PM, Mark Kerzner - SHMSoft
 wrote:
> Christophe,
>
> I am writing my first Hadoop project now, and I have 20 years of consulting,
> and I am in Houston. Here is my resume, http://markkerzner.googlepages.com.
> I have used EC2.
>
> Sincerely,
> Mark
>
>
> On Fri, Jan 23, 2009 at 4:04 PM, Christophe Bisciglia <
> christo...@cloudera.com> wrote:
>
>> Hey all, I wanted to reach out to the user / development community to
>> start identifying those of you who are interested in consulting /
>> contract work for new Hadoop deployments.
>>
>> A number of our larger customers are asking for more extensive on-site
>> help than would normally happen under a support contract, especially
>> to get them started. We're looking for some outside help to staff
>> these projects. This list is where the right people hang out.
>>
>> If you're interested, drop me a note with a bit of background, and
>> we'll figure it out from there.
>>
>> Cheers,
>> Christophe and the Cloudera Team
>>
>


Re: How-to in MapReduce

2009-01-23 Thread Mark Kerzner
Tim,

I looked there, but it is a set up manual. I read the MapReduce, Sazall, and
the MS paper on these, but I need "best practices."

Thank you,
Mark

On Fri, Jan 23, 2009 at 3:22 PM, tim robertson wrote:

> Hi,
>
> Sounds like you might want to look at the Nutch project architecture
> and then see the Nutch on Hadoop tutorial -
> http://wiki.apache.org/nutch/NutchHadoopTutorial  It does web
> crawling, and indexing using Lucene.  It would be a good place to
> start anyway for ideas, even if it doesn't end up meeting your exact
> needs.
>
> Cheers,
>
> Tim
>
>
> On Fri, Jan 23, 2009 at 10:11 PM, Mark Kerzner 
> wrote:
> > Hi, esteemed group,
> > how would I form Maps in MapReduce to recursevely look at every file in a
> > directory, and do something to this file, such as produce a PDF or
> compute
> > its hash?
> >
> > For that matter, Google builds its index using MapReduce, or so the
> papers
> > say. First the crawlers store all the files. Are their papers on the
> > architecture of this? How and where do the crawlers store the downloaded
> > files?
> >
> > Thank you,
> > Mark
> >
>


Re: hadoop consulting?

2009-01-23 Thread Mark Kerzner - SHMSoft
Christophe,

I am writing my first Hadoop project now, and I have 20 years of consulting,
and I am in Houston. Here is my resume, http://markkerzner.googlepages.com.
I have used EC2.

Sincerely,
Mark


On Fri, Jan 23, 2009 at 4:04 PM, Christophe Bisciglia <
christo...@cloudera.com> wrote:

> Hey all, I wanted to reach out to the user / development community to
> start identifying those of you who are interested in consulting /
> contract work for new Hadoop deployments.
>
> A number of our larger customers are asking for more extensive on-site
> help than would normally happen under a support contract, especially
> to get them started. We're looking for some outside help to staff
> these projects. This list is where the right people hang out.
>
> If you're interested, drop me a note with a bit of background, and
> we'll figure it out from there.
>
> Cheers,
> Christophe and the Cloudera Team
>


hadoop consulting?

2009-01-23 Thread Christophe Bisciglia
Hey all, I wanted to reach out to the user / development community to
start identifying those of you who are interested in consulting /
contract work for new Hadoop deployments.

A number of our larger customers are asking for more extensive on-site
help than would normally happen under a support contract, especially
to get them started. We're looking for some outside help to staff
these projects. This list is where the right people hang out.

If you're interested, drop me a note with a bit of background, and
we'll figure it out from there.

Cheers,
Christophe and the Cloudera Team


RE: Problem running hdfs_test

2009-01-23 Thread Arifa Nisar
Thanks a lot for your help. I solved that problem by removing LDFLAGS 
(containing libjvm.so) from hdfs_test compilation. I added that flag to compile 
correctly using Makefile but that was the real problem. Only after removing it 
I was able to run with ant. 

Thanks,
Arifa

-Original Message-
From: Rasit OZDAS [mailto:rasitoz...@gmail.com] 
Sent: Friday, January 23, 2009 6:47 AM
To: core-user@hadoop.apache.org
Subject: Re: Problem running hdfs_test

Hi, Arifa

I had to add "LD_LIBRARY_PATH" env. var. to correctly run my example.
But I have no idea if it helps, because my error wasn't a segmentation
fault. I would try it anyway.

LD_LIBRARY_PATH:/usr/JRE/jre1.6.0_11/jre1.6.0_11/lib:/usr/JRE/jre1.6.0_11/jre1.6.0_11/lib/amd64/server

(server directory of a JRE, which contains libjvm.so file, and lib directory
of the same JRE.)

Hope this helps,
Rasit

2009/1/21 Arifa Nisar 

> Hello,
>
> As I mentioned in my previous email, I am having segmentation fault at
> 0x0001 while running hdfs_test. I was suggested to build and
> run
> hdfs_test usning ant, as ant should set some environment variable which
> Makefile won't. I tried building libhdfs and running hdfs_test using ant
> but
> I am still having same problem. Now, instead of hdfs_test, I am testing a
> simple test with libhdfs. I linked a following hello world program with
> libhdfs.
>
> #include "hdfs.h"
> int main() {
>  printf("Hello World.\n");
>  return(0);
> }
>
> I added a line to compile this test program in
> ${HADOOP_HOME}/src/c++/libhdfs/Makefile and replaced hdfs_test with this
> test program in {HADOOP_HOME}/src/c++/libhdfs/tests/test-libhdfs.sh. I
> build
> and invoked this test using test-libhdfs target in build.xml but I am still
> having segmentation fault when this simple test program is invoked from
> test-libhdfs.sh. I followed the following steps
>
> cd ${HADOOP_HOME}
> ant clean
> cd ${HADOOP_HOME}/src/c++/libhdfs/
> rm -f hdfs_test hdfs_write hdfs_read libhdfs.so* *.o test
> Cd ${HADOOP_HOME}
> ant test-libhdfs -Dlibhdfs=1
>
> Error Line
> --
> [exec] ./tests/test-libhdfs.sh: line 85: 23019 Segmentation fault
> $LIBHDFS_BUILD_DIR/$HDFS_TEST
>
> I have attached the output of this command with this email. I have added
> "env" in test-libhdfs.sh to see what environmental variable are set. Please
> suggest if any variable is wrongly set. Any kind of suggestion will be
> helpful for me as I have already spent a lot of time on this problem.
>
> I have added following lines in Makefile and test-libhdfs.sh
>
> Makefile
> -
> export JAVA_HOME=/usr/lib/jvm/java-1.7.0-icedtea-1.7.0.0.x86_64
> export OS_ARCH=amd64
> export OS_NAME=Linux
> export LIBHDFS_BUILD_DIR=$(HADOOP_HOME)/src/c++/libhdfs
> export SHLIB_VERSION=1
>
> test-libhdfs.sh
> --
>
> HADOOP_CONF_DIR=${HADOOP_HOME}/conf
> HADOOP_LOG_DIR=${HADOOP_HOME}/logs
> LIBHDFS_BUILD_DIR=${HADOOP_HOME}/src/c++/libhdfs
> HDFS_TEST=test
>
> When I don't link libdhfs with test.c it doesn't give error and prints
> "Hello World" when "ant test-libhdfs -Dlibhdfs=1" is run. I made sure that
> "ant" and "hadoop" uses same java installation, I have tried this on 32 bit
> machine but I am still having segmentation fault. Now, I am clueless what I
> can do to correct this. Please help.
>
> Thanks,
> Arifa.
>
> PS: Also please suggest is there any java version of hdfs_test?
>
> -Original Message-
> From: Delip Rao [mailto:delip...@gmail.com]
> Sent: Saturday, January 17, 2009 3:49 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Problem running unning hdfs_test
>
> Try enabling the debug flags while compiling to get more information.
>
> On Sat, Jan 17, 2009 at 4:19 AM, Arifa Nisar 
> wrote:
> > Hello all,
> >
> >
> >
> > I am trying to test hdfs_test.c provided with hadoop installation.
> > libhdfs.so and hdfs_test are built fine after making a few  changes in
> > $(HADOOP_HOME)/src/c++/libhdfs/Makefile. But when I try to run
> ./hdfs_test,
> > I get segmentation fault at 0x0001
> >
> >
> >
> > Program received signal SIGSEGV, Segmentation fault.
> >
> > 0x0001 in ?? ()
> >
> > (gdb) bt
> >
> > #0  0x0001 in ?? ()
> >
> > #1  0x7fffd0c51af5 in ?? ()
> >
> > #2  0x in ?? ()
> >
> >
> >
> > A simple hello world program linked with libdhfs.so also gives the same
> > error. In CLASSPATH all the jar files in $(HADOOP_HOME),
> > $(HADOOP_HOME)/conf, $(HADOOP_HOME)/lib,$(JAVA_HOME)/lib are included.
> > Please help.
> >
> >
> >
> > Thanks,
> >
> > Arifa.
> >
> >
> >
> >
>



-- 
M. Raşit ÖZDAŞ



Re: How-to in MapReduce

2009-01-23 Thread tim robertson
Hi,

Sounds like you might want to look at the Nutch project architecture
and then see the Nutch on Hadoop tutorial -
http://wiki.apache.org/nutch/NutchHadoopTutorial  It does web
crawling, and indexing using Lucene.  It would be a good place to
start anyway for ideas, even if it doesn't end up meeting your exact
needs.

Cheers,

Tim


On Fri, Jan 23, 2009 at 10:11 PM, Mark Kerzner  wrote:
> Hi, esteemed group,
> how would I form Maps in MapReduce to recursevely look at every file in a
> directory, and do something to this file, such as produce a PDF or compute
> its hash?
>
> For that matter, Google builds its index using MapReduce, or so the papers
> say. First the crawlers store all the files. Are their papers on the
> architecture of this? How and where do the crawlers store the downloaded
> files?
>
> Thank you,
> Mark
>


How-to in MapReduce

2009-01-23 Thread Mark Kerzner
Hi, esteemed group,
how would I form Maps in MapReduce to recursevely look at every file in a
directory, and do something to this file, such as produce a PDF or compute
its hash?

For that matter, Google builds its index using MapReduce, or so the papers
say. First the crawlers store all the files. Are their papers on the
architecture of this? How and where do the crawlers store the downloaded
files?

Thank you,
Mark


Re: Why does Hadoop need ssh access to master and slaves?

2009-01-23 Thread Edward Capriolo
I am looking to create some RA scripts and experiment with starting
hadoop via linux-ha cluster manager.  Linux HA would handle restarting
downed nodes and eliminate the ssh key dependency.


Re: using distcp for http source files

2009-01-23 Thread Doug Cutting
Can you please attach your latest version of this to 
https://issues.apache.org/jira/browse/HADOOP-496?


Thanks,

Doug

Boris Musykantski wrote:

we have fixed  some patches in JIRA for support of webdav server on
top of HDFS, updated to work with newer version (0.18.0 IIRC) and
added support for permissions.  See code and description here:

http://www.hadoop.iponweb.net/Home/hdfs-over-webdav

Hope it is useful,

Regards,
Boris, IPonWeb

On Thu, Jan 22, 2009 at 2:30 PM, Doug Cutting  wrote:

Aaron Kimball wrote:

Is anyone aware of an OSS web dav library that
could be wrapped in a FileSystem implementation?

We'd need a Java WebDAV client to talk to foreign filesystems.  But to
expose HDFS to foreign filesystems (i.e., to better support mounting HDFS)
we'd need a Java WebDAV server, like http://milton.ettrema.com/.

Doug



Re: HDFS loosing blocks or connection error

2009-01-23 Thread Raghu Angadi


> It seems hdfs isn't so robust or reliable as the website says and/or I
> have a configuration issue.

quite possible. How robust does the website say it is?

I agree debuggings failures like the following is pretty hard for casual 
users. You need look at the logs for block, or run 'bin/hadoop fsck 
/stats.txt' etc. Reason could as simple as no live datanodes or as 
complex as strange network behavior triggering a bug in DFSClient.


You can start by looking at or attaching client log around the lines 
that contain block id. Also, note the version you are running.


Raghu.

Zak, Richard [USA] wrote:
Might there be a reason for why this seems to routinely happen to me 
when using Hadoop 0.19.0 on Amazon EC2?
 
09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block 
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
live nodes contain current block
09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block 
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
live nodes contain current block
09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block 
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
live nodes contain current block
09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException: 
Could not obtain block: blk_-1757733438820764312_6736 file=/stats.txt
 
 
Richard J. Zak




AlredyBeingCreatedExceptions after upgrade to 0.19.0

2009-01-23 Thread Stefan Will
Hi,

Since I¹ve upgraded to 0.19.0, I¹ve been getting the following exceptions
when restarting jobs, or even when a failed reducer is being restarted by
the job tracker. It appears that stale file locks in the namenode don¹t get
properly released sometimes:

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file ... for DFSClient_attempt_200901211615_0077_r_10_3 on client
10.1.20.119 because current leaseholder is trying to recreate file.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSName
system.java:1052)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.j
ava:995)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

Has anyone run into this ? Any workarounds/fixes ?

Thanks,
Stefan



Re: HDFS loosing blocks or connection error

2009-01-23 Thread Konstantin Shvachko

Yes guys. We observed such problems.
They will be common for 0.18.2 and 0.19.0 exactly as you
described it when data-nodes become unstable.

There were several issues, please take a look
HADOOP-4997 workaround for tmp file handling on DataNodes
HADOOP-4663 - links to other related
HADOOP-4810 Data lost at cluster startup
HADOOP-4702 Failed block replication leaves an incomplete block


We run 0.18.3 now and it does not have these problems.
0.19.1 should be the same.

Thanks,
--Konstantin

Zak, Richard [USA] wrote:

It happens right after the MR job (though once or twice its happened
during).  I am not using EBS, just HDFS between the machines.  As for tasks,
there are 4 mappers and 0 reducers.


Richard J. Zak

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Friday, January 23, 2009 13:24
To: core-user@hadoop.apache.org
Subject: Re: HDFS loosing blocks or connection error

xlarge is good. Is it normally happening during a MR job? If so, how many
tasks do you have running at the same moment overall? Also, is your data
stored on EBS?

Thx,

J-D

On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA]
wrote:


4 slaves, 1 master, all are the m1.xlarge instance type.


Richard J. Zak

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of 
Jean-Daniel Cryans

Sent: Friday, January 23, 2009 12:34
To: core-user@hadoop.apache.org
Subject: Re: HDFS loosing blocks or connection error

Richard,

This happens when the datanodes are too slow and eventually all 
replicas for a single block are tagged as "bad".  What kind of 
instances are you using?

How many of them?

J-D

On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
wrote:

 Might there be a reason for why this seems to routinely happen to 
me when using Hadoop 0.19.0 on Amazon EC2?


09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
No live nodes contain current block

09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
No live nodes contain current block

09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
No live nodes contain current block

09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
Could not obtain block: blk_-1757733438820764312_6736 
file=/stats.txt It seems hdfs isn't so robust or reliable as the 
website says and/or I have a configuration issue.



 Richard J. Zak



Re: HDFS loosing blocks or connection error

2009-01-23 Thread Jean-Daniel Cryans
Yes you may overload your machines that way because of the small number. One
thing to do would be to look in the logs for any signs of IOExceptions and
report them back here. Another thing you can do is to change some configs.
Increase *dfs.datanode.max.xcievers* to 512 and set the
*dfs.datanode.socket.write.timeout
to *0 (this is supposed to be fixed in 0.19 but I've had some problems).** A
HDFS restart is required.

J-D

On Fri, Jan 23, 2009 at 1:26 PM, Zak, Richard [USA] wrote:

> It happens right after the MR job (though once or twice its happened
> during).  I am not using EBS, just HDFS between the machines.  As for
> tasks,
> there are 4 mappers and 0 reducers.
>
>
> Richard J. Zak
>
> -Original Message-
> From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
> Jean-Daniel Cryans
> Sent: Friday, January 23, 2009 13:24
> To: core-user@hadoop.apache.org
> Subject: Re: HDFS loosing blocks or connection error
>
> xlarge is good. Is it normally happening during a MR job? If so, how many
> tasks do you have running at the same moment overall? Also, is your data
> stored on EBS?
>
> Thx,
>
> J-D
>
> On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA]
> wrote:
>
> > 4 slaves, 1 master, all are the m1.xlarge instance type.
> >
> >
> > Richard J. Zak
> >
> > -Original Message-
> > From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
> > Jean-Daniel Cryans
> > Sent: Friday, January 23, 2009 12:34
> > To: core-user@hadoop.apache.org
> > Subject: Re: HDFS loosing blocks or connection error
> >
> > Richard,
> >
> > This happens when the datanodes are too slow and eventually all
> > replicas for a single block are tagged as "bad".  What kind of
> > instances are you using?
> > How many of them?
> >
> > J-D
> >
> > On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
> > wrote:
> >
> > >  Might there be a reason for why this seems to routinely happen to
> > > me when using Hadoop 0.19.0 on Amazon EC2?
> > >
> > > 09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
> > > blk_-1757733438820764312_6736 from any node:  java.io.IOException:
> > > No live nodes contain current block
> > > 09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
> > > blk_-1757733438820764312_6736 from any node:  java.io.IOException:
> > > No live nodes contain current block
> > > 09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
> > > blk_-1757733438820764312_6736 from any node:  java.io.IOException:
> > > No live nodes contain current block
> > > 09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
> > > Could not obtain block: blk_-1757733438820764312_6736
> > > file=/stats.txt It seems hdfs isn't so robust or reliable as the
> > > website says and/or I have a configuration issue.
> > >
> > >
> > >  Richard J. Zak
> > >
> >
>


RE: HDFS loosing blocks or connection error

2009-01-23 Thread Zak, Richard [USA]
It happens right after the MR job (though once or twice its happened
during).  I am not using EBS, just HDFS between the machines.  As for tasks,
there are 4 mappers and 0 reducers.


Richard J. Zak

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Friday, January 23, 2009 13:24
To: core-user@hadoop.apache.org
Subject: Re: HDFS loosing blocks or connection error

xlarge is good. Is it normally happening during a MR job? If so, how many
tasks do you have running at the same moment overall? Also, is your data
stored on EBS?

Thx,

J-D

On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA]
wrote:

> 4 slaves, 1 master, all are the m1.xlarge instance type.
>
>
> Richard J. Zak
>
> -Original Message-
> From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of 
> Jean-Daniel Cryans
> Sent: Friday, January 23, 2009 12:34
> To: core-user@hadoop.apache.org
> Subject: Re: HDFS loosing blocks or connection error
>
> Richard,
>
> This happens when the datanodes are too slow and eventually all 
> replicas for a single block are tagged as "bad".  What kind of 
> instances are you using?
> How many of them?
>
> J-D
>
> On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
> wrote:
>
> >  Might there be a reason for why this seems to routinely happen to 
> > me when using Hadoop 0.19.0 on Amazon EC2?
> >
> > 09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
> > blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
> > No live nodes contain current block
> > 09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
> > blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
> > No live nodes contain current block
> > 09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
> > blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
> > No live nodes contain current block
> > 09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
> > Could not obtain block: blk_-1757733438820764312_6736 
> > file=/stats.txt It seems hdfs isn't so robust or reliable as the 
> > website says and/or I have a configuration issue.
> >
> >
> >  Richard J. Zak
> >
>


smime.p7s
Description: S/MIME cryptographic signature


Re: HDFS loosing blocks or connection error

2009-01-23 Thread Jean-Daniel Cryans
xlarge is good. Is it normally happening during a MR job? If so, how many
tasks do you have running at the same moment overall? Also, is your data
stored on EBS?

Thx,

J-D

On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA] wrote:

> 4 slaves, 1 master, all are the m1.xlarge instance type.
>
>
> Richard J. Zak
>
> -Original Message-
> From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
> Jean-Daniel Cryans
> Sent: Friday, January 23, 2009 12:34
> To: core-user@hadoop.apache.org
> Subject: Re: HDFS loosing blocks or connection error
>
> Richard,
>
> This happens when the datanodes are too slow and eventually all replicas
> for
> a single block are tagged as "bad".  What kind of instances are you using?
> How many of them?
>
> J-D
>
> On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
> wrote:
>
> >  Might there be a reason for why this seems to routinely happen to me
> > when using Hadoop 0.19.0 on Amazon EC2?
> >
> > 09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
> > blk_-1757733438820764312_6736 from any node:  java.io.IOException: No
> > live nodes contain current block
> > 09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
> > blk_-1757733438820764312_6736 from any node:  java.io.IOException: No
> > live nodes contain current block
> > 09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
> > blk_-1757733438820764312_6736 from any node:  java.io.IOException: No
> > live nodes contain current block
> > 09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
> > Could not obtain block: blk_-1757733438820764312_6736 file=/stats.txt
> > It seems hdfs isn't so robust or reliable as the website says and/or I
> > have a configuration issue.
> >
> >
> >  Richard J. Zak
> >
>


RE: HDFS loosing blocks or connection error

2009-01-23 Thread Zak, Richard [USA]
4 slaves, 1 master, all are the m1.xlarge instance type. 


Richard J. Zak

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Friday, January 23, 2009 12:34
To: core-user@hadoop.apache.org
Subject: Re: HDFS loosing blocks or connection error

Richard,

This happens when the datanodes are too slow and eventually all replicas for
a single block are tagged as "bad".  What kind of instances are you using?
How many of them?

J-D

On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
wrote:

>  Might there be a reason for why this seems to routinely happen to me 
> when using Hadoop 0.19.0 on Amazon EC2?
>
> 09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
> blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
> live nodes contain current block
> 09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
> blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
> live nodes contain current block
> 09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
> blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
> live nodes contain current block
> 09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException: 
> Could not obtain block: blk_-1757733438820764312_6736 file=/stats.txt 
> It seems hdfs isn't so robust or reliable as the website says and/or I 
> have a configuration issue.
>
>
>  Richard J. Zak
>


smime.p7s
Description: S/MIME cryptographic signature


Re: HDFS loosing blocks or connection error

2009-01-23 Thread Jean-Daniel Cryans
Richard,

This happens when the datanodes are too slow and eventually all replicas for
a single block are tagged as "bad".  What kind of instances are you using?
How many of them?

J-D

On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA] wrote:

>  Might there be a reason for why this seems to routinely happen to me when
> using Hadoop 0.19.0 on Amazon EC2?
>
> 09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
> blk_-1757733438820764312_6736 from any node:  java.io.IOException: No live
> nodes contain current block
> 09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
> blk_-1757733438820764312_6736 from any node:  java.io.IOException: No live
> nodes contain current block
> 09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
> blk_-1757733438820764312_6736 from any node:  java.io.IOException: No live
> nodes contain current block
> 09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could
> not obtain block: blk_-1757733438820764312_6736 file=/stats.txt
> It seems hdfs isn't so robust or reliable as the website says and/or I have
> a configuration issue.
>
>
>  Richard J. Zak
>


HDFS loosing blocks or connection error

2009-01-23 Thread Zak, Richard [USA]
Might there be a reason for why this seems to routinely happen to me when
using Hadoop 0.19.0 on Amazon EC2?
 
09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No live
nodes contain current block
09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No live
nodes contain current block
09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No live
nodes contain current block
09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could
not obtain block: blk_-1757733438820764312_6736 file=/stats.txt

It seems hdfs isn't so robust or reliable as the website says and/or I have
a configuration issue.
 
 
Richard J. Zak


smime.p7s
Description: S/MIME cryptographic signature


AW: Why does Hadoop need ssh access to master and slaves?

2009-01-23 Thread Matthias Scherer
Hi Tom,

Thanks for your reply. That's what I wanted to know. And it's good to know that 
it would not be a show stopper if our ops department would like to use their 
own system to control daemons.

Regards
Matthias 


> -Ursprüngliche Nachricht-
> Von: Tom White [mailto:t...@cloudera.com] 
> Gesendet: Mittwoch, 21. Januar 2009 14:47
> An: core-user@hadoop.apache.org
> Betreff: Re: Why does Hadoop need ssh access to master and slaves?
> 
> Hi Matthias,
> 
> It is not necessary to have SSH set up to run Hadoop, but it 
> does make things easier. SSH is used by the scripts in the 
> bin directory which start and stop daemons across the cluster 
> (the slave nodes are defined in the slaves file), see the 
> start-all.sh script as a starting point.
> These scripts are a convenient way to control Hadoop, but 
> there are other possibilities. If you had another system to 
> control daemons on your cluster then you wouldn't need SSH.
> 
> Tom
> 
> On Wed, Jan 21, 2009 at 1:20 PM, Matthias Scherer 
>  wrote:
> > Hi Steve and Amit,
> >
> > Thanks for your answers. I agree with you that key-based 
> ssh is nothing to worry about. But I'm wondering what exactly 
> - that means wich grid administration tasks - hadoop does via 
> ssh?! Does it restart crashed data nodes or tasks trackers on 
> the slaves? Oder does it transfer data over the grid with ssh 
> access? How can I find a short description what exactly 
> hadoop needs ssh for? The documentation says only that I have 
> to configure it.
> >
> > Thanks & Regards
> > Matthias
> >
> >
> >> -Ursprüngliche Nachricht-
> >> Von: Steve Loughran [mailto:ste...@apache.org]
> >> Gesendet: Mittwoch, 21. Januar 2009 13:59
> >> An: core-user@hadoop.apache.org
> >> Betreff: Re: Why does Hadoop need ssh access to master and slaves?
> >>
> >> Amit k. Saha wrote:
> >> > On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer 
> >> >  wrote:
> >> >> Hi all,
> >> >>
> >> >> we've made our first steps in evaluating hadoop. The setup
> >> of 2 VMs
> >> >> as a hadoop grid was very easy and works fine.
> >> >>
> >> >> Now our operations team wonders why hadoop has to be able
> >> to connect
> >> >> to the master and slaves via password-less ssh?! Can
> >> anyone give us
> >> >> an answer to this question?
> >> >
> >> > 1. There has to be a way to connect to the remote hosts-
> >> slaves and a
> >> > secondary master, and SSH is the secure way to do it 2. It
> >> has to be
> >> > password-less to enable automatic logins
> >> >
> >>
> >> SSH is *a * secure way to do it, but not the only way. Other 
> >> management tools can bring up hadoop clusters. Hadoop ships with 
> >> scripted support for SSH as it is standard with Linux distros and 
> >> generally the best way to bring up a remote console.
> >>
> >> Matthias,
> >> Your ops team should not be worrying about the SSH 
> security, as long 
> >> as they keep their keys under control.
> >>
> >> (a) Key-based SSH is more secure than passworded SSH, as 
> >> man-in-middle attacks are prevented. passphrase protected 
> SSH keys on 
> >> external USB keys even better.
> >>
> >> (b) once the cluster is up, that filesystem is pretty 
> vulnerable to 
> >> anything on the LAN. You do need to lock down your 
> datacentre, or set 
> >> up the firewall/routing of the servers so that only 
> trusted hosts can 
> >> talk to the FS. SSH becomes a detail at that point.
> >>
> >>
> >>
> >
> 


Re: Problem running hdfs_test

2009-01-23 Thread Rasit OZDAS
Hi, Arifa

I had to add "LD_LIBRARY_PATH" env. var. to correctly run my example.
But I have no idea if it helps, because my error wasn't a segmentation
fault. I would try it anyway.

LD_LIBRARY_PATH:/usr/JRE/jre1.6.0_11/jre1.6.0_11/lib:/usr/JRE/jre1.6.0_11/jre1.6.0_11/lib/amd64/server

(server directory of a JRE, which contains libjvm.so file, and lib directory
of the same JRE.)

Hope this helps,
Rasit

2009/1/21 Arifa Nisar 

> Hello,
>
> As I mentioned in my previous email, I am having segmentation fault at
> 0x0001 while running hdfs_test. I was suggested to build and
> run
> hdfs_test usning ant, as ant should set some environment variable which
> Makefile won't. I tried building libhdfs and running hdfs_test using ant
> but
> I am still having same problem. Now, instead of hdfs_test, I am testing a
> simple test with libhdfs. I linked a following hello world program with
> libhdfs.
>
> #include "hdfs.h"
> int main() {
>  printf("Hello World.\n");
>  return(0);
> }
>
> I added a line to compile this test program in
> ${HADOOP_HOME}/src/c++/libhdfs/Makefile and replaced hdfs_test with this
> test program in {HADOOP_HOME}/src/c++/libhdfs/tests/test-libhdfs.sh. I
> build
> and invoked this test using test-libhdfs target in build.xml but I am still
> having segmentation fault when this simple test program is invoked from
> test-libhdfs.sh. I followed the following steps
>
> cd ${HADOOP_HOME}
> ant clean
> cd ${HADOOP_HOME}/src/c++/libhdfs/
> rm -f hdfs_test hdfs_write hdfs_read libhdfs.so* *.o test
> Cd ${HADOOP_HOME}
> ant test-libhdfs -Dlibhdfs=1
>
> Error Line
> --
> [exec] ./tests/test-libhdfs.sh: line 85: 23019 Segmentation fault
> $LIBHDFS_BUILD_DIR/$HDFS_TEST
>
> I have attached the output of this command with this email. I have added
> "env" in test-libhdfs.sh to see what environmental variable are set. Please
> suggest if any variable is wrongly set. Any kind of suggestion will be
> helpful for me as I have already spent a lot of time on this problem.
>
> I have added following lines in Makefile and test-libhdfs.sh
>
> Makefile
> -
> export JAVA_HOME=/usr/lib/jvm/java-1.7.0-icedtea-1.7.0.0.x86_64
> export OS_ARCH=amd64
> export OS_NAME=Linux
> export LIBHDFS_BUILD_DIR=$(HADOOP_HOME)/src/c++/libhdfs
> export SHLIB_VERSION=1
>
> test-libhdfs.sh
> --
>
> HADOOP_CONF_DIR=${HADOOP_HOME}/conf
> HADOOP_LOG_DIR=${HADOOP_HOME}/logs
> LIBHDFS_BUILD_DIR=${HADOOP_HOME}/src/c++/libhdfs
> HDFS_TEST=test
>
> When I don't link libdhfs with test.c it doesn't give error and prints
> "Hello World" when "ant test-libhdfs -Dlibhdfs=1" is run. I made sure that
> "ant" and "hadoop" uses same java installation, I have tried this on 32 bit
> machine but I am still having segmentation fault. Now, I am clueless what I
> can do to correct this. Please help.
>
> Thanks,
> Arifa.
>
> PS: Also please suggest is there any java version of hdfs_test?
>
> -Original Message-
> From: Delip Rao [mailto:delip...@gmail.com]
> Sent: Saturday, January 17, 2009 3:49 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Problem running unning hdfs_test
>
> Try enabling the debug flags while compiling to get more information.
>
> On Sat, Jan 17, 2009 at 4:19 AM, Arifa Nisar 
> wrote:
> > Hello all,
> >
> >
> >
> > I am trying to test hdfs_test.c provided with hadoop installation.
> > libhdfs.so and hdfs_test are built fine after making a few  changes in
> > $(HADOOP_HOME)/src/c++/libhdfs/Makefile. But when I try to run
> ./hdfs_test,
> > I get segmentation fault at 0x0001
> >
> >
> >
> > Program received signal SIGSEGV, Segmentation fault.
> >
> > 0x0001 in ?? ()
> >
> > (gdb) bt
> >
> > #0  0x0001 in ?? ()
> >
> > #1  0x7fffd0c51af5 in ?? ()
> >
> > #2  0x in ?? ()
> >
> >
> >
> > A simple hello world program linked with libdhfs.so also gives the same
> > error. In CLASSPATH all the jar files in $(HADOOP_HOME),
> > $(HADOOP_HOME)/conf, $(HADOOP_HOME)/lib,$(JAVA_HOME)/lib are included.
> > Please help.
> >
> >
> >
> > Thanks,
> >
> > Arifa.
> >
> >
> >
> >
>



-- 
M. Raşit ÖZDAŞ


Re: Distributed cache testing in local mode

2009-01-23 Thread Tom White
It would be nice to make this more uniform. There's an outstanding
Jira on this if anyone is interested in looking at it:
https://issues.apache.org/jira/browse/HADOOP-2914

Tom

On Fri, Jan 23, 2009 at 12:14 AM, Aaron Kimball  wrote:
> Hi Bhupesh,
>
> I've noticed the same problem -- LocalJobRunner makes the DistributedCache
> effectively not work; so my code often winds up with two codepaths to
> retrieve the local data :\
>
> You could try running in pseudo-distributed mode to test, though then you
> lose the ability to run a single-stepping debugger on the whole end-to-end
> process.
>
> - Aaron
>
> On Thu, Jan 22, 2009 at 11:29 AM, Bhupesh Bansal wrote:
>
>> Hey folks,
>>
>> I am trying to use Distributed cache in hadoop jobs to pass around
>> configuration files , external-jars (job sepecific) and some archive data.
>>
>> I want to test Job end-to-end in local mode, but I think the distributed
>> caches are localized in TaskTracker code which is not called in local mode
>> Through LocalJobRunner.
>>
>> I can do some fairly simple workarounds for this but was just wondering if
>> folks have more ideas about it.
>>
>> Thanks
>> Bhupesh
>>
>>
>


_temporary directory getting deleted mid-job?

2009-01-23 Thread Aaron Kimball
I saw some puzzling behavior tonight when running a MapReduce program I
wrote.

It would perform the mapping just fine, and would begin to shuffle. It got
to 33% complete reduce (end of shuffle) and then the task fails, claiming
that /_temporary was deleted.

I didn't touch HDFS while this was going on.

I tried running the job multiple more times, and this repeated twice more.
Puzzlingly, I was doing bin/hadoop fs -ls  periodically in
another window. The _temporary directory got created just fine, but at some
point after shuffling began, it was removed.

I tried to see if I could manually race this, so I did a mkdir _temporary,
and the job proceeded just fine. Even more bizarre, the removal of the
_temporary directory did not occur on any subsequent MR jobs (executions of
the same, unmodified program). So I can't reproduce the bug.

This is on 0.18.2.

It went away, so I'm not *too* concerned, but I'd rather not deal with
heisenbugs if at all possible

So: has anyone seen this behavior? Have you figured out how to reproduce it,
or even better, prevent it?

Thanks,
- Aaron