Ilya Ganelin created APEXCORE-392:
-------------------------------------

             Summary: Stack Overflow when launching jobs
                 Key: APEXCORE-392
                 URL: https://issues.apache.org/jira/browse/APEXCORE-392
             Project: Apache Apex Core
          Issue Type: Bug
    Affects Versions: 3.2.0
            Reporter: Ilya Ganelin
            Priority: Blocker


I’m running into a very frustrating issue where certain DAG configurations 
cause the following error log (attached). When this happens, my application 
even fails to launch. This does not seem to be a YARN issue since this occurs 
even with a relatively small number of partitions/memory.

This issue DOES appear to be related to HDFS input/output operations since the 
specific parameter that appears to affect things is the number of physical 
partitions for the HDFS input/output operators.

I’ve also attached the input and output operators in question:
https://gist.github.com/ilganeli/7f770374113b40ffa18a

I can get this to occur predictable by

  1.  Increasing the partition count on my input operator (reads from HDFS) - 
values above 20 cause this error
  2.  Increase the partition count on my output operator (writes to HDFS) - 
values above 20 cause this error
  3.  Set stream locality from the default to either thread local, node local, 
or container_local on the output operator

This behavior is very frustrating as it’s preventing me from partitioning my 
HDFS I/O appropriately, thus allowing me to scale to higher throughputs.

Do you have any thoughts on what’s going wrong? I would love your feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to