Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/467
  
    The performance penalties are minimal.  The number of files will equal the 
number of reducers, which does not scale with the data, and user specifiable.  
Also we are just sorting the file handles here, not the contents, so OOM errors 
are very unlikely.  The contents are sorted by virtue of MapReduce, the files 
are named in an ordered way by virtue of our custom partitioner, this just 
ensures that the files are processed in order.
    
    I'm not treating this as just a test problem.  This is a problem of our 
assumptions not being correct.  This could be a problem for the real pcap 
system, not just the test, if people are using non-HDFS implementation.  For 
HDFS, it's probably not an issue (I'm not even sure of that in all cases, 
honestly and there is no guarantee for the behavior to change since it's not 
mandated), but I'd rather own our assumptions rather than depend on Filesystem 
operations which do not conform to our assumptions necessarily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to