[jira] [Created] (TEZ-4224) Add Laszlo Bodor's public key to KEYS

2020-08-25 Thread Jira
László Bodor created TEZ-4224:
-

 Summary: Add Laszlo Bodor's public key to KEYS
 Key: TEZ-4224
 URL: https://issues.apache.org/jira/browse/TEZ-4224
 Project: Apache Tez
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (TEZ-4224) Add Laszlo Bodor's public key to KEYS

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned TEZ-4224:
-

Assignee: László Bodor

> Add Laszlo Bodor's public key to KEYS
> -
>
> Key: TEZ-4224
> URL: https://issues.apache.org/jira/browse/TEZ-4224
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (TEZ-3645) Reuse SerializationFactory while sorting, merging, and writing IFiles

2020-08-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183937#comment-17183937
 ] 

László Bodor edited comment on TEZ-3645 at 8/25/20, 11:08 AM:
--

[~jeagles]: this is marked az 0.10 blocker, is there anything I can help with 
this patch? 
(this week I can address the latest comments by a new patch if it helps)


was (Author: abstractdog):
this is marked az 0.10 blocker, is there anything I can help with this patch? 
(this week I can address the latest comments if it helps)

> Reuse SerializationFactory while sorting, merging, and writing IFiles 
> --
>
> Key: TEZ-3645
> URL: https://issues.apache.org/jira/browse/TEZ-3645
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: 0.10_blocker
> Attachments: TEZ-3645.003.patch, TEZ-3645.004.patch, 
> TEZ-3645.1.patch, TEZ-3645.2.patch
>
>
> Of course this is not reusing the serializer, just the SerializationFactory 
> and Serialization. They are jointly responsible for iterating over the list 
> of available serializers and finding an acceptable one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3645) Reuse SerializationFactory while sorting, merging, and writing IFiles

2020-08-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183937#comment-17183937
 ] 

László Bodor commented on TEZ-3645:
---

this is marked az 0.10 blocker, is there anything I can help with this patch? 
(this week I can address the latest comments if it helps)

> Reuse SerializationFactory while sorting, merging, and writing IFiles 
> --
>
> Key: TEZ-3645
> URL: https://issues.apache.org/jira/browse/TEZ-3645
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: 0.10_blocker
> Attachments: TEZ-3645.003.patch, TEZ-3645.004.patch, 
> TEZ-3645.1.patch, TEZ-3645.2.patch
>
>
> Of course this is not reusing the serializer, just the SerializationFactory 
> and Serialization. They are jointly responsible for iterating over the list 
> of available serializers and finding an acceptable one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4224) Add Laszlo Bodor's public key to KEYS

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4224:
--
Attachment: TEZ-4224.01.patch

> Add Laszlo Bodor's public key to KEYS
> -
>
> Key: TEZ-4224
> URL: https://issues.apache.org/jira/browse/TEZ-4224
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-4224.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4224) Add Laszlo Bodor's public key to KEYS

2020-08-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183945#comment-17183945
 ] 

László Bodor commented on TEZ-4224:
---

[~jeagles]: could you please take a look?

> Add Laszlo Bodor's public key to KEYS
> -
>
> Key: TEZ-4224
> URL: https://issues.apache.org/jira/browse/TEZ-4224
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-4224.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4213) Bound appContext executor capacity using a configurable property

2020-08-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183969#comment-17183969
 ] 

László Bodor commented on TEZ-4213:
---

forgot to double-check checkstyle warning, fixed in addendum commit:
https://github.com/apache/tez/commit/99895f9808170ce64fd1e7c6dfb2e932f4578489

> Bound appContext executor capacity using a configurable property
> 
>
> Key: TEZ-4213
> URL: https://issues.apache.org/jira/browse/TEZ-4213
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: 0.10_blocker
> Fix For: 0.10.1
>
> Attachments: TEZ-4213.01.patch, TEZ-4213.02.patch, TEZ-4213.03.patch, 
> TEZ-4213.04.patch, TEZ-4213.05.patch, TEZ-4213.06.patch, TEZ-4213.07.patch, 
> TEZ-4213.08.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> After TEZ-4170 was merged, appContext executor pool is also used by the 
> RootInputInitializerManager to speed up SplitGeneration.
> However, this executor pool currently has not capacity limit 
> https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L624
> The problem the occurs when generating splits for larger inputs (thousands or 
> more) is that it can could result to
> {color:red}java.lang.OutOfMemoryError{color}
> that is also reproducible with a test case.
> https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/RootInputInitializerManager.java#L130
> To avoid such errors, I propose to limit the capacity of this pool to a 
> configurable value that can be for example the number of physical cores by 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3645) Reuse SerializationFactory while sorting, merging, and writing IFiles

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3645:
--
Attachment: TEZ-3645.005.patch

> Reuse SerializationFactory while sorting, merging, and writing IFiles 
> --
>
> Key: TEZ-3645
> URL: https://issues.apache.org/jira/browse/TEZ-3645
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: 0.10_blocker
> Attachments: TEZ-3645.003.patch, TEZ-3645.004.patch, 
> TEZ-3645.005.patch, TEZ-3645.1.patch, TEZ-3645.2.patch
>
>
> Of course this is not reusing the serializer, just the SerializationFactory 
> and Serialization. They are jointly responsible for iterating over the list 
> of available serializers and finding an acceptable one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3645) Reuse SerializationFactory while sorting, merging, and writing IFiles

2020-08-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184056#comment-17184056
 ] 

László Bodor commented on TEZ-3645:
---

uploaded  [^TEZ-3645.005.patch]  with the changes: introduced 
SerializationContext and init it in MergeManager's constructor


> Reuse SerializationFactory while sorting, merging, and writing IFiles 
> --
>
> Key: TEZ-3645
> URL: https://issues.apache.org/jira/browse/TEZ-3645
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: 0.10_blocker
> Attachments: TEZ-3645.003.patch, TEZ-3645.004.patch, 
> TEZ-3645.005.patch, TEZ-3645.1.patch, TEZ-3645.2.patch
>
>
> Of course this is not reusing the serializer, just the SerializationFactory 
> and Serialization. They are jointly responsible for iterating over the list 
> of available serializers and finding an acceptable one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4224) Add Laszlo Bodor's public key to KEYS

2020-08-25 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184093#comment-17184093
 ] 

Jonathan Turner Eagles commented on TEZ-4224:
-

Can you post a link to the public key server used to verify this public keys? 
Also, be careful to never lose the private key. Keep a backup of the key in a 
secure location with backup. If the key is compromised, we can go through the 
process of updating the key.

> Add Laszlo Bodor's public key to KEYS
> -
>
> Key: TEZ-4224
> URL: https://issues.apache.org/jira/browse/TEZ-4224
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-4224.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4043) Create a yetus compatible checkstyle configuration

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4043:
--
Fix Version/s: 0.10.0

> Create a yetus compatible checkstyle configuration
> --
>
> Key: TEZ-4043
> URL: https://issues.apache.org/jira/browse/TEZ-4043
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4043.001.patch, TEZ-4043.002.patch
>
>
> Tez follows Hadoop source code guidelines with the exception of 120 character 
> line length.
> http://maven.apache.org/plugins/maven-checkstyle-plugin/examples/multi-module-config.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3994) Upgrade maven-surefire-plugin to 0.21.0 to support yetus

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3994:
--
Fix Version/s: 0.10.0

> Upgrade maven-surefire-plugin to 0.21.0 to support yetus
> 
>
> Key: TEZ-3994
> URL: https://issues.apache.org/jira/browse/TEZ-3994
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-3994.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4022) Upgrade Maven Surefire plugin to 3.0.0-M1

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4022:
--
Fix Version/s: 0.10.0

> Upgrade Maven Surefire plugin to 3.0.0-M1
> -
>
> Key: TEZ-4022
> URL: https://issues.apache.org/jira/browse/TEZ-4022
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4022.001.patch
>
>
> Recently all the unit tests are failing. This is caused by the latest Java 8 
> issue reported at SUREFIRE-1588 and fixed in Maven Surefire plugin 3.0.0-M1. 
> We need to update the plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3957) Report TASK_DURATION_MILLIS as a Counter for completed tasks

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3957:
--
Fix Version/s: 0.10.0

> Report TASK_DURATION_MILLIS as a Counter for completed tasks
> 
>
> Key: TEZ-3957
> URL: https://issues.apache.org/jira/browse/TEZ-3957
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-3957.01.patch, TEZ-3957.02.patch, TEZ-3957.02.patch, 
> TEZ-3957.03.patch, TEZ-3957.patch
>
>
> timeTaken is already being reported by {{TaskAttemptFinishedEvent}}, but not 
> as a Counter.
> Combined with TEZ-3911, this provides min(timeTaken), max(timeTaken), 
> avg(timeTaken).
> The value will be: {{finishTime - launchTime}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4028) Events not visible from proto history logging for s3a filesystem until dag completes.

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4028:
--
Fix Version/s: 0.10.0

> Events not visible from proto history logging for s3a filesystem until dag 
> completes.
> -
>
> Key: TEZ-4028
> URL: https://issues.apache.org/jira/browse/TEZ-4028
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
>  Labels: history
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4028.01.patch, TEZ-4028.02.patch
>
>
> The events are not visible in the files because  s3 filesystem
> * flush writes to local disk and only upload/commit to s3 on close.
> * does not support append
> As an initial fix we log the dag submitted, initialized and started events 
> into a file and these can be read to get the dag plan, config from the AM. 
> The counters are anyways not available until the dag completes.
> The in-progress information cannot be read, this can be obtained from the AM 
> once we have the above events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3975) Please add OWASP Dependency Check to the build (pom.xml)

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3975:
--
Fix Version/s: 0.10.0

> Please add OWASP Dependency Check to the build (pom.xml)
> 
>
> Key: TEZ-3975
> URL: https://issues.apache.org/jira/browse/TEZ-3975
> Project: Apache Tez
>  Issue Type: New Feature
>Affects Versions: 0.8.next, 0.10.0, 0.10.1
> Environment: All development, build, test, environments.
>Reporter: Albert Baker
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: build, easy-fix, security
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-3975.001.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  Please add OWASP Dependency Check to the build (pom.xml).  OWASP DC makes an 
> outbound REST call to MITRE Common Vulnerabilities & Exposures (CVE) to 
> perform a lookup for each dependant .jar to list any/all known 
> vulnerabilities for each jar.  This step is needed because a manual MITRE CVE 
> lookup/check on the main component does not include checking for 
> vulnerabilities in components or in dependant libraries.
> OWASP Dependency check : 
> https://www.owasp.org/index.php/OWASP_Dependency_Check has plug-ins for most 
> Java build/make types (ant, maven, ivy, gradle).   
> Also, add the appropriate command to the nightly build to generate a report 
> of all known vulnerabilities in any/all third party libraries/dependencies 
> that get pulled in. example : mvn -Powasp -Dtest=false -DfailIfNoTests=false 
> clean aggregate
> Generating this report nightly/weekly will help inform the project's 
> development team if any dependant libraries have a reported known 
> vulnerailities.  Project teams that keep up with removing vulnerabilities on 
> a weekly basis will help protect businesses that rely on these open source 
> componets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4058) Changes for 0.9.2 release

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4058:
--
Fix Version/s: 0.10.0

> Changes for 0.9.2 release
> -
>
> Key: TEZ-4058
> URL: https://issues.apache.org/jira/browse/TEZ-4058
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4058.001.patch
>
>
> Update Tez_DOAP.rtf, index.md etc. for 0.9.2 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4223) Adding new jars or resources after the first DAG runs does not work.

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4223:
--
Fix Version/s: 0.10.0

> Adding new jars or resources after the first DAG runs does not work.
> 
>
> Key: TEZ-4223
> URL: https://issues.apache.org/jira/browse/TEZ-4223
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
> Fix For: 0.10.0, 0.10.1, 0.9.3
>
> Attachments: TEZ-4223.02.patch, TEZ-4223.03.patch, TEZ-4223.04.patch
>
>
> If we executed DAG which needs additional jars after the first DAG is run, we 
> get ClassNotFoundException.
>  
>  
> {noformat}
> 2020-08-03 13:57:14,776 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: 
> Added additional resources : 
> [[file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-pool-1.5.4.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/postgresql-42.2.8.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/hive-jdbc-handler-3.1.3000.7.2.2.0-73.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/mssql-jdbc-6.2.1.jre7.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-dbcp-1.4.jar]]
>  to classpath
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hive.storage.jdbc.JdbcInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
> ...
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hive.storage.jdbc.JdbcInputFormat
> at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
> at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:398)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
> ... 46 more{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4042) Speculative attempts should avoid running on the same node

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4042:
--
Fix Version/s: 0.10.0

> Speculative attempts should avoid running on the same node
> --
>
> Key: TEZ-4042
> URL: https://issues.apache.org/jira/browse/TEZ-4042
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Ying Han
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4042.001.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4115) turn on data-via-events as default

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4115:
--
Fix Version/s: 0.10.0

> turn on data-via-events as default
> --
>
> Key: TEZ-4115
> URL: https://issues.apache.org/jira/browse/TEZ-4115
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Richard Zhang
>Assignee: Richard Zhang
>Priority: Minor
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4115.1.patch
>
>
> tez.runtime.transfer.data-via-events.enabled will be enabled as true by 
> default



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4047) Tez trademark in xml is causing xml parsing issue

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4047:
--
Fix Version/s: 0.10.0

> Tez trademark in xml is causing xml parsing issue
> -
>
> Key: TEZ-4047
> URL: https://issues.apache.org/jira/browse/TEZ-4047
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4047.001.patch
>
>
> {code}
> docs/src/site/site.xml:
> [Fatal Error] site.xml:97:34: The entity "reg" was referenced, but not 
> declared.
> java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
> file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: 
> 34; The entity "reg" was referenced, but not declared.
>   at 
> jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:397)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:449)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:406)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:402)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155)
>   at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
>   at com.sun.tools.script.shell.Main.evaluateString(Main.java:298)
>   at com.sun.tools.script.shell.Main.evaluateString(Main.java:319)
>   at com.sun.tools.script.shell.Main.access$300(Main.java:37)
>   at com.sun.tools.script.shell.Main$3.run(Main.java:217)
>   at com.sun.tools.script.shell.Main.main(Main.java:48)
> Caused by: org.xml.sax.SAXParseException; systemId: 
> file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: 
> 34; The entity "reg" was referenced, but not declared.
>   at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
>   at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
>   at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
>   at 
> jdk.nashorn.internal.scripts.Script$Recompilation$2$19313A$\^system_init\_.XMLDocument(:747)
>   at jdk.nashorn.internal.scripts.Script$1$\^string\_.:program(:1)
>   at 
> jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:637)
>   at 
> jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494)
>   at 
> jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393)
>   ... 10 more
> {code}
> Also output from xmllint verifies xml issue as well.
> {code}
> xmllint ./docs/src/site/site.xml
> .//src/site/site.xml:97: parser error : Entity 'reg' not defined
>  http://tez.apache.org/"/>
>  ^
> .//src/site/site.xml:123: parser error : Entity 'reg' not defined
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4076) Add hadoop-cloud-storage jar to aws and azure mvn profiles

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4076:
--
Fix Version/s: 0.10.0

> Add hadoop-cloud-storage jar to aws and azure mvn profiles
> --
>
> Key: TEZ-4076
> URL: https://issues.apache.org/jira/browse/TEZ-4076
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4076.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would make sense to include the dependencies in the 
> {{hadoop-cloud-storage}} jar file when choosing aws or azure profiles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3952) Allow Tez task speculation to grant greater customization of certain parameters

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3952:
--
Fix Version/s: 0.10.0

> Allow Tez task speculation to grant greater customization of certain 
> parameters
> ---
>
> Key: TEZ-3952
> URL: https://issues.apache.org/jira/browse/TEZ-3952
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Nishant Dash
>Assignee: Nishant Dash
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-3952.001.patch, TEZ-3952.002.patch, 
> TEZ-3952.003.patch, TEZ-3952.004.patch, TEZ-3952.005.patch, TEZ-3952.006.patch
>
>
> Many of the settings for Tez task speculation are hardcoded and should 
> instead be configurable. For example, there's no equivalent config settings 
> for the following MapReduce settings:
> - mapreduce.job.speculative.speculative-cap-running-tasks
> - mapreduce.job.speculative.retry-after-no-speculate
> - mapreduce.job.speculative.retry-after-speculate
> - mapreduce.job.speculative.minimum-allowed-tasks
> - mapreduce.job.speculative.speculative-cap-total-tasks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4146) Register RUNNING state in DAG's state change callback

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4146:
--
Fix Version/s: 0.10.0

> Register RUNNING state in DAG's state change callback
> -
>
> Key: TEZ-4146
> URL: https://issues.apache.org/jira/browse/TEZ-4146
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 0.10.0, 0.10.1, 0.9.3
>
> Attachments: TEZ-4146.1.patch, TEZ-4146.2.patch, TEZ-4146.3.patch
>
>
> It would be good to register RUNNING in the DAG state change callbacks. This 
> would help applications like Hive, when it [monitors the 
> job|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java#L182]
>  continuously for getting runtime breakdown at the end of the job..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4052) Fit dot files ASF License issues - part 2

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4052:
--
Fix Version/s: 0.10.0

> Fit dot files ASF License issues - part 2
> -
>
> Key: TEZ-4052
> URL: https://issues.apache.org/jira/browse/TEZ-4052
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4052.001.patch
>
>
> Continuing the effort in TEZ-3995.
> https://issues.apache.org/jira/browse/TEZ-3995?focusedCommentId=16784595&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16784595
> {code}
> 1) Please extend this to tez-ext-service-tests 2) Also, please consider 
> directory tez.log.dir with path ${project.build.directory}/logs.
> {code}
> This jira is to making sure all dot files are correctly placed under target 
> directory as to 1) make sure file aren't created outside the build directory 
> and 2) and named as part of a broader test directory design



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4040) Upgrade RoaringBitmap version to avoid NoSuchMethodError

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4040:
--
Fix Version/s: 0.10.0

> Upgrade RoaringBitmap version to avoid NoSuchMethodError
> 
>
> Key: TEZ-4040
> URL: https://issues.apache.org/jira/browse/TEZ-4040
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: 0.4.9.api.txt, 0.5.11.api.txt, 0.5.21.api.txt, 
> TEZ-4040.001.patch, TEZ-4040.002.patch
>
>
> a common request is to use the runOptimize function which is present is later 
> versions of roaringbitmap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4096) SSLFactory should pickup configs from incoming conf payload

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4096:
--
Fix Version/s: 0.10.0

> SSLFactory should pickup configs from incoming conf payload
> ---
>
> Key: TEZ-4096
> URL: https://issues.apache.org/jira/browse/TEZ-4096
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4096.1.patch, TEZ-4096.2.patch, TEZ-4096.3.patch
>
>
> SSLFactory uses "String" instead of "Path" for adding "ssl-client.xml". When 
> addResource is invoked with string, {{Configuration}} tries to find it in 
> classloader and does not load the file correctly.
> [https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/http/SSLFactory.java#L107]
> Conf: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L3064]
> This creates issue when ssl-client.xml is located in different path other 
> than the classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4091) UnorderedPartitionedKVWriter::readDataForDME should check if in-mem file is flushed or not

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4091:
--
Fix Version/s: 0.10.0

> UnorderedPartitionedKVWriter::readDataForDME should check if in-mem file is 
> flushed or not
> --
>
> Key: TEZ-4091
> URL: https://issues.apache.org/jira/browse/TEZ-4091
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4091.1.patch, TEZ-4091.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It is possible that the in-mem cache flushed out the data to file. Before 
> sending the data over wire, it would be good to check if the data got flushed 
> out to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3990:
--
Fix Version/s: 0.10.0

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch, TEZ-3990.006.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4066) Upgrade servlet-api from 2.5 to 3.1.0

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4066:
--
Fix Version/s: 0.10.0

> Upgrade servlet-api from 2.5 to 3.1.0
> -
>
> Key: TEZ-4066
> URL: https://issues.apache.org/jira/browse/TEZ-4066
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4066.001.patch, TEZ-4066.002.patch
>
>
> Oozie launcher jobs trying to launch Tez jobs now fail to render Oozie 
> Launcher Job AM due to both 2.5 (from tez) and 3.1.0 (from hadoop) 
> servlet-api both being in the classpath. Tez should sync with servlet api 
> version from tez master branch that only supports hadoop 3+
> {code}
> 2019-04-30 14:53:02,747 WARN [qtp1213419524-119] 
> org.eclipse.jetty.server.HttpChannel:
> java.lang.NoSuchMethodError: 
> javax.servlet.http.HttpServletRequest.isAsyncStarted()Z
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:688)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>   at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4174) [Kubernetes] Fetcher should connection failure on SocketException

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4174:
--
Fix Version/s: 0.10.0

> [Kubernetes] Fetcher should connection failure on SocketException
> -
>
> Key: TEZ-4174
> URL: https://issues.apache.org/jira/browse/TEZ-4174
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4174.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fetcher considers connection failure only when http.connect throws exception. 
> In kubernetes environment, where there can be intermediate proxies, 
> getInputStream from http connection can throw connection reset error (5xx). 
> These errors should be considered as connection failures as well.
> {code:java}
> 2020-05-08 17:03:54.080  WARN [Fetcher_B {Map_3} #3] shuffle.Fetcher: Fetch 
> Failure while connecting from 10.117.155.27 to: 10.117.154.115:25551, 
> attempt: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, 
> pathComponent=attempt_1588982534035__1_00_00_0_10030, spillType=0, 
> spillId=-1] Informing ShuffleManager:
> java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:210)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:706)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
> at 
> org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:530)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:563)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:487)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:285)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4068) Prevent new speculative attempt after task has issued canCommit to an attempt

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4068:
--
Fix Version/s: 0.10.0

> Prevent new speculative attempt after task has issued canCommit to an attempt
> -
>
> Key: TEZ-4068
> URL: https://issues.apache.org/jira/browse/TEZ-4068
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Ying Han
>Priority: Major
> Fix For: 0.10.0, 0.10.1, 0.9.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When a running attempt calls TaskImpl#canCommit through the taskUmbilical, 
> the TaskImpl will issue a "go" if it is the first attempt to do so. Otherwise 
> it will issue a "no-go". After commitAttempt is assigned is TaskImpl, no 
> other attempt is allowed to succeed at that point. So a speculative attempt 
> that is launched after commitAttempt is assigned can never finished before 
> the original since is will allows be given a "no-go" in the canCommit 
> response. In this jira, I propose to discuss disabling speculative attempts 
> after commitAttempt has been assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4057) Fix Unsorted broadcast shuffle umasks

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4057:
--
Fix Version/s: 0.10.0

> Fix Unsorted broadcast shuffle umasks
> -
>
> Key: TEZ-4057
> URL: https://issues.apache.org/jira/browse/TEZ-4057
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.2
>Reporter: Gopal Vijayaraghavan
>Assignee: Eric Wohlstadter
>Priority: Major
> Fix For: 0.10.0, 0.10.1, 0.9.3
>
> Attachments: TEZ-4057.1.patch
>
>
> {code}
> if (numPartitions == 1 && !pipelinedShuffle) {
>   //special case, where in only one partition is available.
>   finalOutPath = outputFileHandler.getOutputFileForWrite();
>   finalIndexPath = 
> outputFileHandler.getOutputIndexFileForWrite(indexFileSizeEstimate);
>   skipBuffers = true;
>   writer = new IFile.Writer(conf, rfs, finalOutPath, keyClass, valClass,
>   codec, outputRecordsCounter, outputRecordBytesCounter);
> } else {
>   skipBuffers = false;
>   writer = null;
> }
> {code}
> The broadcast events don't update the file umasks, because they have 1 
> partition.
> {code}
> total 8.0K
> -rw--- 1 hive hadoop 15 Mar 27 20:30 file.out
> -rw-r- 1 hive hadoop 32 Mar 27 20:30 file.out.index
> {code}
> ending up with readable index files and unreadable .out files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4113) TezUtils::createByteStringFromConf should use snappy instead of DeflaterOutputStream

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4113:
--
Fix Version/s: 0.10.0

> TezUtils::createByteStringFromConf should use snappy instead of 
> DeflaterOutputStream
> 
>
> Key: TEZ-4113
> URL: https://issues.apache.org/jira/browse/TEZ-4113
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Fix For: 0.10.0, 0.10.1
>
> Attachments: Screenshot 2020-01-10 at 6.32.31 AM.png, TEZ-4113.1.patch
>
>
> Under concurrent workload, where lots of short running DAGs were submitted in 
> Hive, HS2 spikes up heavily on CPU due to 
> {{TezUtils::createByteStringFromConf}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4207) Provide approximate number of input records to be processed in UnorderedKVInput

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4207:
--
Fix Version/s: 0.10.0

> Provide approximate number of input records to be processed in 
> UnorderedKVInput
> ---
>
> Key: TEZ-4207
> URL: https://issues.apache.org/jira/browse/TEZ-4207
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4207.1.patch, TEZ-4207.wip.patch
>
>
> There are cases when broadcasted data is loaded into hashtable in upstream 
> applications (e.g Hive). Apps tends to predict the number of entries in the 
> hashtable diligently, but there are cases where these estimates can be very 
> complicated at compile time.
>  
> Tez can help in such cases, by providing "approximate number of input records 
> counter", to be processed in UnorderedKVInput. This is to avoid expensive 
> rehash when hashtable sizes are not estimated correctly. It would be good to 
> start with broadcast first and then to move on to unordered partitioned case 
> later.
>  
> This would help in predicting the number of entries at runtime & can get 
> better estimates for hashtable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3998) Allow CONCURRENT edge property in DAG construction and introduce ConcurrentSchedulingType

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3998:
--
Fix Version/s: 0.10.0

> Allow CONCURRENT edge property in DAG construction and introduce 
> ConcurrentSchedulingType
> -
>
> Key: TEZ-3998
> URL: https://issues.apache.org/jira/browse/TEZ-3998
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Yingda Chen
>Assignee: Yingda Chen
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-3998.001.patch.diff
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is the first task related to TEZ-3997
>  
> |Note: There is no API change in this proposed change. The majority of this 
> change will be lifting some existing constraints against CONCURRENT edge 
> type, and addition of a VertexMangerPlugin implementation.|
>  
> This includes enabling the CONCURRENT SchedulingType as a valid edge 
> property, by removing all the sanity check against CONCURRENT during DAG 
> construction/execution. A new VertexManagerPlugin (namely 
> VertexManagerWithConcurrentInput) will be implemented for vertex with 
> incoming concurrent edge(s). 
> In addition, we will assume in this change that 
>  * A vertex *cannot* have both SEQUENTIAL and CONCURRENT incoming edges 
>  * No shuffle or data movement is handled by Tez framework when two vertices 
> are connected through a CONCURRENT edge. Instead, runtime should be 
> responsible for handling all the data-plane communications (as proposed in 
> [1]).
> Note that the above assumptions are common for scenarios such as whole-DAG or 
> sub-graph gang scheduling, but they may be relaxed in later implementation, 
> which may allow mixture of SEQUENTIAL and CONCURRENT edges on the same vertex.
>  
> Most of the (meaningful) scheduling decisions today in Tez are made based on 
> the notion of (or an extended version of) source task completion. This will 
> no longer be true in presence of CONCURRENT edge. Instead, events such as 
> source vertex configured, or source task running will become more relevant 
> when making scheduling decision for two vertices connected via a CONCURRENT 
> edge.  We therefore introduce a new enum *ConcurrentSchedulingType* to 
> describe the “scheduling timing” for the downstream vertex in such scenarios. 
> |public enum ConcurrentSchedulingType{
>    /** * trigger downstream vertex tasks scheduling by "configured" event of 
> upstream vertices */
>   SOURCE_VERTEX_CONFIGURED,
>    /** * trigger downstream vertex tasks scheduling by "running" event of 
> upstream tasks */ 
>   SOURCE_TASK_STARTED 
> }|
>  
> Note that in this change, we will only use SOURCE_VERTEX_CONFIGURED as the 
> scheduling type, which suffice for scenarios of whole-DAG or sub-graph 
> gang-scheduling, where we want (all the tasks in) the downstream vertex to be 
> scheduled together with (all the tasks) in the upstream vertex. In this case, 
> we can leverage the existing onVertexStateUpdated() interface of 
> VextexMangerPlugin to collect relevant information to assist the scheduling 
> decision, and *there is no additional API change necessary*. However, in more 
> subtle case such as the parameter-server example described in Fig. 1, other 
> scheduling type would be more relevant, therefore the placeholder for 
> *ConcurrentSchedulingType* will be introduced in this change as part of the 
> infrastructure work.
>  
> Finally, since we assume that all communications between two vertices 
> connected via CONCURRENT edge are handled by application runtime, a 
> CONCURRENT edge will be assigned a DummyEdgeManager that basically mute all 
> DME/VME handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4037) Add back DAG search status KILLED

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4037:
--
Fix Version/s: 0.10.0

> Add back DAG search status KILLED 
> --
>
> Key: TEZ-4037
> URL: https://issues.apache.org/jira/browse/TEZ-4037
> Project: Apache Tez
>  Issue Type: Task
>  Components: UI
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4037.001.patch
>
>
> https://issues.apache.org/jira/browse/TEZ-2447 removed KILLED since sometimes 
> this status can fail to search all KILLED DAGs. This jira re-adds KILLED dag 
> status search since it still has value and would rather focus on fixing the 
> DAGs who fail to write killed status to history log file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4041) TestExtServicesWithLocalMode fails in docker

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4041:
--
Fix Version/s: 0.10.0

> TestExtServicesWithLocalMode fails in docker
> 
>
> Key: TEZ-4041
> URL: https://issues.apache.org/jira/browse/TEZ-4041
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4041.001.patch
>
>
> {code}
> 2019-02-13 00:24:33,703 INFO  [DAGAppMaster Thread] service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.tez.dag.app.DAGAppMaster failed in state INITED
> org.apache.tez.dag.api.TezUncheckedException: 
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.(TaskCommunicatorManager.java:152)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1088)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:532)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2606)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2603)
>   at org.apache.tez.client.LocalClient$1.run(LocalClient.java:327)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213)
>   ... 12 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.tez.test.service.rpc.TezTestServiceProtocolProtos$SubmitWorkRequestProto$Builder.setUser(TezTestServiceProtocolProtos.java:5549)
>   at 
> org.apache.tez.dag.app.taskcomm.TezTestServiceTaskCommunicatorImpl.(TezTestServiceTaskCommunicatorImpl.java:65)
>   ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3989) Fix by-laws related to emeritus clause

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3989:
--
Fix Version/s: 0.10.0

> Fix by-laws related to emeritus clause 
> ---
>
> Key: TEZ-3989
> URL: https://issues.apache.org/jira/browse/TEZ-3989
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
>
> The emeritus clause is not valid and needs to be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3982:
--
Fix Version/s: 0.10.0

> DAGAppMaster and tasks should not report negative or invalid progress
> -
>
> Key: TEZ-3982
> URL: https://issues.apache.org/jira/browse/TEZ-3982
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, 
> TEZ-3982.003.patch, TEZ-3982.004.patch, TEZ-3982.005.branch-0.9.patch
>
>
> AM fails (AMRMClient expects non negative progress) if any component reports 
> invalid or -ve progress, DagAppMaster/Tasks should check and report 
> accordingly to allow the AM to execute.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4032) TEZ will throw "Client cannot authenticate via:[TOKEN, KERBEROS]" when used with HDFS federation(non viewfs, only hdfs schema used).

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4032:
--
Fix Version/s: 0.10.0

> TEZ will throw "Client cannot authenticate via:[TOKEN, KERBEROS]"  when used 
> with HDFS federation(non viewfs, only hdfs schema used). 
> --
>
> Key: TEZ-4032
> URL: https://issues.apache.org/jira/browse/TEZ-4032
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4032.001.patch, TEZ-4032.002.patch, 
> TEZ-4032.003.patch, TEZ-4032.004.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I execute hive tez job in HDFS federation and kerberos. The hadoop cluster 
> has multiple  namespace (hdfs://ns1,hdfs://ns2,hdfs://ns3 ...)and we don't 
> use viewfs schema.  Hive tez job will throw  error as follows  when the table 
> is created in hdfs://ns2 (default configuration  fs.defaluFS=hdfs://ns1):
> {code:java}
> 2019-01-21 15:43:46,507 [WARN] [TezChild] |ipc.Client|: Exception encountered 
> while connecting to the server : 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]
> 2019-01-21 15:43:46,507 [INFO] [TezChild] |retry.RetryInvocationHandler|: 
> java.io.IOException: DestHost:destPort docker5.cmss.com:8020 , 
> LocalHost:localPort docker1.cmss.com/10.254.10.116:0. Failed on local 
> exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS], while invoking 
> ClientNamenodeProtocolTranslatorPB.getFileInfo over 
> docker5.cmss.com/10.254.2.106:8020 after 14 failover attempts. Trying to 
> failover after sleeping for 10827ms.
> 2019-01-21 15:43:57,338 [WARN] [TezChild] |ipc.Client|: Exception encountered 
> while connecting to the server : 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]
> 2019-01-21 15:43:57,363 [ERROR] [TezChild] |tez.MapRecordSource|: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing writable (null)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:568)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> DestHost:destPort docker4.cmss.com:8020 , LocalHost:localPort 
> docker1.cmss.com/10.254.10.116:0. Failed on local exception: 
> java.io.IOException: org.apache.hadoop.security.AccessControlException: 
> Client cannot authenticate via:[TOKEN, KERBEROS]
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator

[jira] [Updated] (TEZ-4208) Pipelinesorter uses single SortSpan after spill

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4208:
--
Fix Version/s: 0.10.0

> Pipelinesorter uses single SortSpan after spill
> ---
>
> Key: TEZ-4208
> URL: https://issues.apache.org/jira/browse/TEZ-4208
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4208.1.patch, TEZ-4208.2.patch, q67_sorter.log
>
>
> Though it could have created multiple spans, tez always uses the first span 
> after spill. It is quite possible that other spans are bigger compared to the 
> first one, due to progressive space allocation.  Fixing this would help in 
> reducing the number of spills (depending on the jobs) and lesser load for 
> indexcache entries (as lesser number of files have to be opened).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4036) TestMockDAGAppMaster#testInternalPreemption should assert for failed state

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4036:
--
Fix Version/s: 0.10.0

> TestMockDAGAppMaster#testInternalPreemption should assert for failed state
> --
>
> Key: TEZ-4036
> URL: https://issues.apache.org/jira/browse/TEZ-4036
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4036.001.patch
>
>
> Due to root cause mentioned in TEZ-3950, the test fails regularly. Until the 
> fix for that JIRA is in (which is rather a good amount of redesign) , adding 
> failed assert to the test as this is now an expected state for the task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4156) Fix Tez to reuse IPC connections

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4156:
--
Fix Version/s: 0.10.0

> Fix Tez to reuse IPC connections
> 
>
> Key: TEZ-4156
> URL: https://issues.apache.org/jira/browse/TEZ-4156
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4156.1.patch, TEZ-4156.2.patch, TEZ-4156.3.patch, 
> TEZ-4156.4.patch
>
>
> When tracking DAG progress, TezClientUtils ends up creating new remote user. 
> Because of this new UGI creation, IPC connections are not reused internally.
> https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java#L965
> More info from Hadoop side:
> In hadoop's IPC layer, connectionIds are checked based on 
> UserGroupInformation.
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1600
> However, UserGroupInformation comparison is based on ==
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1789



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4172) Let tasks be killed after too many overall attempts

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4172:
--
Fix Version/s: 0.10.0

> Let tasks be killed after too many overall attempts
> ---
>
> Key: TEZ-4172
> URL: https://issues.apache.org/jira/browse/TEZ-4172
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 0.10.0, 0.10.1, 0.9.3
>
> Attachments: TEZ-4172.01.patch, TEZ-4172.02.patch
>
>
> Currently, TaskImpl doesn't consider failing a task if there are too many 
> overall attempts. In case of LLAP, the number of preempted task attempts -> 
> overall task attempts [can grow in a 
> linkedhashmap|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L127].
> In an edge case, where an upstream application (Hive LLAP) cannot cope with a 
> problematic query, this can also lead to OOM in the AM, due the very high 
> number of TaskAttemptImpl objects.
> It would be beneficial to have the chance to limit the overall number of task 
> attempts, regardless of they have been failed or killed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3976) Batch ShuffleManager error report events

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3976:
--
Fix Version/s: 0.10.0

> Batch ShuffleManager error report events
> 
>
> Key: TEZ-3976
> URL: https://issues.apache.org/jira/browse/TEZ-3976
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-3976.1.patch, TEZ-3976.2.patch, TEZ-3976.3.patch, 
> TEZ-3976.4.patch, TEZ-3976.5.patch, TEZ-3976.6.patch, TEZ-3976.7.patch, 
> TEZ-3976.8.patch, TEZ-3976.9.patch
>
>
> The symptoms are a lot of these logs are being shown:
> {code:java}
> 2018-06-15T18:09:35,811 INFO  [Fetcher_B {Reducer_5} #0 ()] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: 
> Fetch failed for src: InputAttemptIdentifier [inputIdentifier=701, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, 
> spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=701, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, 
> spillId=0], connectFailed: true
> 2018-06-15T18:09:35,811 WARN  [Fetcher_B {Reducer_5} #1 ()] 
> org.apache.tez.runtime.library.common.shuffle.Fetcher: copyInputs failed for 
> tasks [InputAttemptIdentifier [inputIdentifier=589, attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0]]
> 2018-06-15T18:09:35,811 INFO  [Fetcher_B {Reducer_5} #1 ()] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: 
> Fetch failed for src: InputAttemptIdentifier [inputIdentifier=589, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=589, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0], connectFailed: true
> {code}
> Each of those translate into an event in the AM which finally crashes due to 
> OOM after around 30 minutes and around 10 million shuffle input errors (and 
> 10 million lines like the previous ones). When the ShufflerManager is closed 
> and the counters reported there are many shuffle input errors, some of those 
> logs are:
> {code:java}
> 2018-06-15T17:46:30,988  INFO [TezTR-441963_21_34_4_0_4 
> (152901963_0021_34_04_00_4)] runtime.LogicalIOProcessorRuntimeTask: 
> Final Counters for attempt_152901963_0021_34_04_00_4: Counters: 43 
> [[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=0, 
> NUM_SHUFFLED_INPUTS=26, NUM_FAILED_SHUFFLE_INPUTS=858965, 
> INPUT_RECORDS_PROCESSED=26, OUTPUT_RECORDS=1, OUTPUT_LARGE_RECORDS=0, 
> OUTPUT_BYTES=779472, OUTPUT_BYTES_WITH_OVERHEAD=779483, 
> OUTPUT_BYTES_PHYSICAL=780146, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, 
> SHUFFLE_BYTES=4207563, SHUFFLE_BYTES_DECOMPRESSED=20266603, 
> SHUFFLE_BYTES_TO_MEM=3380616, SHUFFLE_BYTES_TO_DISK=0, 
> SHUFFLE_BYTES_DISK_DIRECT=826947, SHUFFLE_PHASE_TIME=52516, 
> FIRST_EVENT_RECEIVED=1, LAST_EVENT_RECEIVED=1185][HIVE 
> RECORDS_OUT_INTERMEDIATE_^[[1;35;40m^[[KReducer_12^[[m^[[K=1, 
> RECORDS_OUT_OPERATOR_GBY_159=1, 
> RECORDS_OUT_OPERATOR_RS_160=1][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_INPUT_Map_11
>  FIRST_EVENT_RECEIVED=1, INPUT_RECORDS_PROCESSED=26, 
> LAST_EVENT_RECEIVED=1185, NUM_FAILED_SHUFFLE_INPUTS=858965, 
> NUM_SHUFFLED_INPUTS=26, SHUFFLE_BYTES=4207563, 
> SHUFFLE_BYTES_DECOMPRESSED=20266603, SHUFFLE_BYTES_DISK_DIRECT=826947, 
> SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=3380616, 
> SHUFFLE_PHASE_TIME=52516][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_OUTPUT_Map_1
>  ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779472, OUTPUT_BYTES_PHYSICAL=780146, 
> OUTPUT_BYTES_WITH_OVERHEAD=779483, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, 
> SPILLED_RECORDS=0]]
> 2018-06-15T17:46:32,271 INFO  [TezTR-441963_21_34_3_15_1 ()] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Final Counters for 
> attempt_152901963_0021_34_03_15_1: Counters: 87 [[File System 
> Counters FILE_BYTES_READ=0, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0, 
> FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=2344929, 
> HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=5, HDFS_LARGE_READ_OPS=0, 
> HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter 
> SPILLED_RECORDS=0, NUM_SHUFFLED_INPUTS=1, NUM_FAILED_SHUFFLE_INPUTS=105195, 
> INPUT_RECORDS_PROCESSED=397, INPUT_SPLIT_LENGTH_BYTES=21563271, 
> OUTPUT_RECORDS=15737, OUTPUT_LARGE_RECORDS=0, OUTPUT_BYTES=1235818, 
> OUTPUT_BYTES_WITH_OVERHEAD=1267307, OUTPUT_

[jira] [Updated] (TEZ-4213) Bound appContext executor capacity using a configurable property

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4213:
--
Fix Version/s: 0.10.0

> Bound appContext executor capacity using a configurable property
> 
>
> Key: TEZ-4213
> URL: https://issues.apache.org/jira/browse/TEZ-4213
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: 0.10_blocker
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4213.01.patch, TEZ-4213.02.patch, TEZ-4213.03.patch, 
> TEZ-4213.04.patch, TEZ-4213.05.patch, TEZ-4213.06.patch, TEZ-4213.07.patch, 
> TEZ-4213.08.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> After TEZ-4170 was merged, appContext executor pool is also used by the 
> RootInputInitializerManager to speed up SplitGeneration.
> However, this executor pool currently has not capacity limit 
> https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L624
> The problem the occurs when generating splits for larger inputs (thousands or 
> more) is that it can could result to
> {color:red}java.lang.OutOfMemoryError{color}
> that is also reproducible with a test case.
> https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/RootInputInitializerManager.java#L130
> To avoid such errors, I propose to limit the capacity of this pool to a 
> configurable value that can be for example the number of physical cores by 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4062) Speculative attempt scheduling should be aborted when Task has completed

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4062:
--
Fix Version/s: 0.10.0

> Speculative attempt scheduling should be aborted when Task has completed
> 
>
> Key: TEZ-4062
> URL: https://issues.apache.org/jira/browse/TEZ-4062
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Yingda Chen
>Assignee: Ying Han
>Priority: Major
> Fix For: 0.10.0, 0.10.1, 0.9.3
>
> Attachments: TEZ-4062.001.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In RedundantScheduleTransition (inside TaskImpl), we try to find the oldest 
> running attempt and use it as the causual attempt when doing 
> "addAndScheduleAttempt".
>  
> However, the task may have completed at this moment, i.e., the task attempt 
> that was considered running and long-tailed by speculator is now completed. 
> In this case, we may not be able to find any unfinished attempt, which will 
> lead to NPE in following logic (even without NPE, it still makes no sense to 
> proceed with scheduling speculative attempt anyway)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4075) Tez: Reimplement tez.runtime.transfer.data-via-events.enabled

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4075:
--
Fix Version/s: 0.10.0

> Tez: Reimplement tez.runtime.transfer.data-via-events.enabled
> -
>
> Key: TEZ-4075
> URL: https://issues.apache.org/jira/browse/TEZ-4075
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Gopal Vijayaraghavan
>Assignee: Richard Zhang
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4075.10.patch, TEZ-4075.15.patch, TEZ-4075.16.patch, 
> TEZ-4075.enable-dme.16.patch, Tez-4075.5.patch, Tez-4075.8.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This was factored out by TEZ-2196, which does skip buffers for 1-partition 
> data exchanges (therefore goes to disk directly).
> {code}
> if (shufflePayload.hasData()) {   
> shuffleManager.addKnownInput(shufflePayload.getHost(),
>   DataProto dataProto = shufflePayload.getData(); 
> shufflePayload.getPort(), srcAttemptIdentifier, srcIndex);
>   FetchedInput fetchedInput = 
> inputAllocator.allocate(dataProto.getRawLength(),   
>   dataProto.getCompressedLength(), srcAttemptIdentifier); 
>   moveDataToFetchedInput(dataProto, fetchedInput, hostIdentifier);
>   shuffleManager.addCompletedInputWithData(srcAttemptIdentifier, 
> fetchedInput);   
> } else {  
>   shuffleManager.addKnownInput(shufflePayload.getHost(),  
>   shufflePayload.getPort(), srcAttemptIdentifier, srcIndex);  
> } 
> {code}
> got removed in 
> https://github.com/apache/tez/commit/1ba1f927c16a1d7c273b6cd1a8553e5269d1541a
> It would be better to buffer up the 512Byte limit for the event size before 
> writing to disk, since creating a new file always incurs disk traffic, even 
> if the file is eventually being served out of the buffer cache.
> The total overhead of receiving an event, then firing an HTTP call to fetch 
> the data etc adds approx 100-150ms to a query - the data xfer through the 
> event will skip the disk entirely for this & also remove the extra IOPS 
> incurred.
> This channel is not suitable for large-scale event transport, but 
> specifically the workload here deals with 1-row control tables which consume 
> more bandwidth with HTTP headers and hostnames than the 93 byte payload.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-1348) Allow Tez local mode to run against filesystems other than local FS

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-1348:
--
Fix Version/s: 0.10.0

> Allow Tez local mode to run against filesystems other than local FS
> ---
>
> Key: TEZ-1348
> URL: https://issues.apache.org/jira/browse/TEZ-1348
> Project: Apache Tez
>  Issue Type: Sub-task
> Environment: Committed to branch-0.9.
>Reporter: Siddharth Seth
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: tez-1348.patch, tez-1348.patch, tez-1348.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In TEZ-717, I incorrect thought setting fs.defaultFS programmatically in 
> tez-site would work for local mode.
> Currently the requirement is that tez-site.xml must have fs.defaultFS set to 
> file:///.
> While that works, it doesn't allow for seamless execution in either 
> local-mode or on a cluster.
> The main issue here is that when Inputs / Outputs are configured - they use a 
> version of configuration which reads tez-site, and do not use the 
> configuration from the client itself (which is correct behaviour).
> Not sure what a good way to fix this is 
> 1) It may be possible to override this value each time an instance of 
> Configuration/TezConfiguration is created. One possible way would be to 
> statically add a default resource to Configuration the moment a local client 
> is created.
> 2) Provide information in the contexts on whether this is local or not. This 
> is fairly ugly, and would get in the way of running mixed mode tasks.
> Anyone have other suggestions ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3988) Update snapshot version in master to 0.10.1-SNAPSHOT

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3988:
--
Fix Version/s: 0.10.0

> Update snapshot version in master to 0.10.1-SNAPSHOT
> 
>
> Key: TEZ-3988
> URL: https://issues.apache.org/jira/browse/TEZ-3988
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-3988.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4086) Some Tez examples cannot work with outputPaths on a FS other than the default FS

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4086:
--
Fix Version/s: 0.10.0

> Some Tez examples cannot work with outputPaths on a FS other than the default 
> FS
> 
>
> Key: TEZ-4086
> URL: https://issues.apache.org/jira/browse/TEZ-4086
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4086.01.txt, TEZ-4086.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There's several examples which make use of the FileSystem based on the 
> default config.
> This results in failure if the outputPath is on a different FileSystem. (e.g. 
> fs.defaultFS set to HDFS and outputPath for the example set to s3)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3972) Tez DAG can hang when a single task fails to fetch

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3972:
--
Fix Version/s: 0.10.0

> Tez DAG can hang when a single task fails to fetch
> --
>
> Key: TEZ-3972
> URL: https://issues.apache.org/jira/browse/TEZ-3972
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-3972.001.patch, TEZ-3972.002.patch, 
> TEZ-3972.003.patch
>
>
> Description of the hung DAG:
> A DAG with 2 vertices. {{Map}} Vertex has 22k maps, downstream vertex 
> {{Reduce}} has 1009 tasks. All tasks succeed but one, which hangs. This one 
> task (attempt) is doing a local fetch from a node that (now) has a bad disk. 
> It fails to fetch and reports to the AM for the offending input attempt 
> identifiers. However the AM does not schedule a re-run as 
> {{uniquefailedOutputReports}} size is 1 (since only this task attempt failed 
> to fetch) and failure fraction is not met. The denominator for this fraction 
> is the total number of tasks. That causes the re-run to never occur. This 
> JIRA tracks the AM side of the change to alleviate this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4204) Data race in RootInputInitializerManager

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4204:
--
Fix Version/s: 0.10.0

> Data race in RootInputInitializerManager
> 
>
> Key: TEZ-4204
> URL: https://issues.apache.org/jira/browse/TEZ-4204
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Blocker
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4204.1.patch, TEZ-4204.1.patch, TEZ-4204.2.patch
>
>
> After https://issues.apache.org/jira/browse/TEZ-4170 there is a data race for 
> initializerMap in RootInputInitializerManager. initializerMap should be 
> initialized before vertex state is set to initializing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4044) Zookeeper: exclude jline from Zookeeper client from tez dist

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4044:
--
Fix Version/s: 0.10.0

> Zookeeper: exclude jline from Zookeeper client from tez dist
> 
>
> Key: TEZ-4044
> URL: https://issues.apache.org/jira/browse/TEZ-4044
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Gopal Vijayaraghavan
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4044.1.patch
>
>
> {code}
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.4.9:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> {code}
> Breaks CLI clients further down the dependency tree.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4021) API incompatibility wro4j-maven-plugin

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4021:
--
Fix Version/s: 0.10.0

> API incompatibility wro4j-maven-plugin
> --
>
> Key: TEZ-4021
> URL: https://issues.apache.org/jira/browse/TEZ-4021
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4021.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4102) Let session credentials be merged before merging am launch credentials

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4102:
--
Fix Version/s: 0.10.0

> Let session credentials be merged before merging am launch credentials
> --
>
> Key: TEZ-4102
> URL: https://issues.apache.org/jira/browse/TEZ-4102
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4102.01.patch, TEZ-4102.02.patch, TEZ-4102.03.patch, 
> TEZ-4102.04.patch, TEZ-4102.05.patch, TEZ-4102.06.patch
>
>
> Given the following scenario: kerberos + long running session + dags keep 
> submitted to the session
> After 24h the queries can fail, because tasks don't have the correct 
> HDFS_DELEGATION_TOKEN, because there is chance that am credentials has been 
> previously filled with tokens and it cannot be overridden by session 
> credentials 
> [here|https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java#L485]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4098) tez-tools improvements: log-split, swimlane

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4098:
--
Fix Version/s: 0.10.0

> tez-tools improvements: log-split, swimlane
> ---
>
> Key: TEZ-4098
> URL: https://issues.apache.org/jira/browse/TEZ-4098
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4098.01.patch, TEZ-4098.02.patch, TEZ-4098.03.patch, 
> TEZ-4098.03.patch, TEZ-4098.03.patch, TEZ-4098.03.patch, TEZ-4098.04.patch, 
> TEZ-4098.05.patch
>
>
> While using tez-tools for analyzing application logs, I'm about to improve 
> them a little bit. Details will be added here to the description.
> 1. Support swimlane.sh to consume local file
> 2. Create a log splitter, which is able to split the aggregated log file into 
> separate container directories, like below:
> {code}
> ├── container_e02_1572948601374_0004_01_01
> │   ├── container-localizer-syslog
> │   ├── dag_1572948601374_0004_1.dot
> │   ├── prelaunch.err
> │   ├── prelaunch.out
> │   ├── stderr
> │   ├── stdout
> │   ├── syslog
> │   ├── syslog_dag_1572948601374_0004_1
> │   └── syslog_dag_1572948601374_0004_1_post
> ├── container_e02_1572948601374_0004_01_02
> │   ├── prelaunch.err
> │   ├── prelaunch.out
> │   ├── stderr
> │   ├── stdout
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4179) [Kubernetes] Extend NodeId in tez to support unique worker identity

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4179:
--
Fix Version/s: 0.10.0

> [Kubernetes] Extend NodeId in tez to support unique worker identity
> ---
>
> Key: TEZ-4179
> URL: https://issues.apache.org/jira/browse/TEZ-4179
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4179.1.patch, TEZ-4179.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In kubernetes environment where pods can have same host name and port, there 
> can be situations where node trackers could be retaining old instance of the 
> pod in its cache. In case of Hive LLAP, where the llap tez task scheduler 
> maintains the membership of nodes based on zookeeper registry events there 
> can be cases where NODE_ADDED followed by NODE_REMOVED event could end up 
> removing the node/host from node trackers because of stable hostname and 
> service port. The NODE_REMOVED event in this case is old stale event of the 
> already dead pod but ZK will send only after session timeout (in case of 
> non-graceful shutdown). If this sequence of events happen, a node/host is 
> completely lost form the schedulers perspective. 
> To support this scenario, tez can extend yarn's NodeId to include 
> uniqueIdentifier. Llap task scheduler can construct the container object with 
> this new NodeId that includes uniqueIdentifier as well so that stale events 
> like above will only remove the host/node that matches the old 
> uniqueIdentifier. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4035) Tez master breaks with YARN 3.2.0 ApplicationReport API change

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4035:
--
Fix Version/s: 0.10.0

> Tez master breaks with YARN 3.2.0 ApplicationReport API change
> --
>
> Key: TEZ-4035
> URL: https://issues.apache.org/jira/browse/TEZ-4035
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Jonathan Turner Eagles
>Priority: Minor
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4035.001.patch
>
>
> {noformat}
> tez/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/NotRunningJob.java:[89,29]
>  no suitable method found for 
> newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,int,int,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,,java.lang.String,float,java.lang.String,)
> [ERROR] method 
> org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token)
>  is not applicable
> [ERROR] (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token,java.util.Set,boolean,org.apache.hadoop.yarn.api.records.Priority,java.lang.String,java.lang.String)
>  is not applicable
> [ERROR] (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token,java.util.Set,boolean,org.apache.hadoop.yarn.api.records.Priority,java.lang.String,java.lang.String)
>  is not applicable{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4049) Fix findbugs issues in NotRunningJob

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4049:
--
Fix Version/s: 0.10.0

> Fix findbugs issues in NotRunningJob
> 
>
> Key: TEZ-4049
> URL: https://issues.apache.org/jira/browse/TEZ-4049
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4049.001.patch
>
>
> Introduced by TEZ-4035. Remove fixes while keeping 3.2.0 api compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4050) maven site is failing due to missing configuration.

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4050:
--
Fix Version/s: 0.10.0

> maven site is failing due to missing configuration.
> ---
>
> Key: TEZ-4050
> URL: https://issues.apache.org/jira/browse/TEZ-4050
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4050.001.patch
>
>
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-site-plugin:3.4:stage (default-cli) on project 
> tez-docs: Missing site information in the distribution management of the 
> project Tez (org.apache.tez:tez-docs:0.10.1-SNAPSHOT) -> [Help 1]
> {code}
> From maven site plugin usage we can see we are missing configuration.
> https://maven.apache.org/plugins/maven-site-plugin/usage.html
> {code}
> 
>   ...
>   
> 
>   www.yourcompany.com
>   scp://www.yourcompany.com/www/docs/project/
> 
>   
>   ...
> 
> {code}
> Tez does not use this url to deploy and neither does hadoop. But it is needed 
> to stage site documentation. url is only used during site:deploy which is 
> never called during Tez QA step.
> This jira aims to provide a place holder (the same as hadoop)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4048) Make proto history logger queue size configurable

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4048:
--
Fix Version/s: 0.10.0

> Make proto history logger queue size configurable
> -
>
> Key: TEZ-4048
> URL: https://issues.apache.org/jira/browse/TEZ-4048
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4048.1.patch
>
>
> Currently, the queue size is hard-coded to 10K which may be small for some 
> bigger cluster. Make it configurable and bump up the default. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4012) Add docker support for Tez.

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4012:
--
Fix Version/s: 0.10.0

> Add docker support for Tez.
> ---
>
> Key: TEZ-4012
> URL: https://issues.apache.org/jira/browse/TEZ-4012
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4012.001.patch, TEZ-4012.002.patch, 
> TEZ-4012.003.patch
>
>
> Hadoop label builds contain a mix of development tools and versions. In 
> particular H11-H20 are unusable by tez since protoc -version is 2.6.x and 
> hadoop only supports 2.5.0. This jira will allow builds across all H1-H20 
> jenkins machines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4206) TestSpeculation.testBasicSpeculationPerVertexConf is flaky

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4206:
--
Fix Version/s: 0.10.0

> TestSpeculation.testBasicSpeculationPerVertexConf is flaky
> --
>
> Key: TEZ-4206
> URL: https://issues.apache.org/jira/browse/TEZ-4206
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
> Fix For: 0.10.0, 0.10.1, 0.9.3
>
> Attachments: TEZ-4206.1.patch
>
>
> Test is flaky due to timing issue in MockDAGAppMaster's clock and 
> LegacySpeculator
> [https://builds.apache.org/job/PreCommit-TEZ-Build/491/]
> [https://builds.apache.org/job/PreCommit-TEZ-Build/492/]
> [https://builds.apache.org/job/PreCommit-TEZ-Build/493/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4034) Column selector filter should be case-insensitive

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4034:
--
Fix Version/s: 0.10.0

> Column selector filter should be case-insensitive
> -
>
> Key: TEZ-4034
> URL: https://issues.apache.org/jira/browse/TEZ-4034
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jacob Tolar
>Assignee: Jacob Tolar
>Priority: Minor
> Fix For: 0.9.2, 0.10.0, 0.10.1
>
> Attachments: TEZ-4034.1.patch, image-2019-02-01-09-33-24-480.png
>
>
> In this dialog box: 
>  
> !image-2019-02-01-09-33-24-480.png!
>  
> The filter is case-sensitive. So if I type lower-case 'd', I see 'Id' but not 
> 'Dag Name'. It would be nice if the search ignored case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4088) Create in-memory ifile writer for transferring smaller payloads (follow up of TEZ-4075)

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4088:
--
Fix Version/s: 0.10.0

> Create in-memory ifile writer for transferring smaller payloads (follow up of 
> TEZ-4075)
> ---
>
> Key: TEZ-4088
> URL: https://issues.apache.org/jira/browse/TEZ-4088
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4088.1.patch, TEZ-4088.2.patch, TEZ-4088.3.patch, 
> TEZ-4088.5.patch, TEZ-4088.6.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> TEZ-4075 enabled data transfer over DME for smaller payloads. This helps in 
> reducing shuffle. 
> However, it still incurs disk IO cost (+flush) in producer side. It would be 
> good to retain smaller payloads in mem, so that disk IO costs can be saved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4048) Make proto history logger queue size configurable

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4048:
--
Fix Version/s: (was: 0.10.1)

> Make proto history logger queue size configurable
> -
>
> Key: TEZ-4048
> URL: https://issues.apache.org/jira/browse/TEZ-4048
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4048.1.patch
>
>
> Currently, the queue size is hard-coded to 10K which may be small for some 
> bigger cluster. Make it configurable and bump up the default. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3982:
--
Fix Version/s: (was: 0.10.1)

> DAGAppMaster and tasks should not report negative or invalid progress
> -
>
> Key: TEZ-3982
> URL: https://issues.apache.org/jira/browse/TEZ-3982
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, 
> TEZ-3982.003.patch, TEZ-3982.004.patch, TEZ-3982.005.branch-0.9.patch
>
>
> AM fails (AMRMClient expects non negative progress) if any component reports 
> invalid or -ve progress, DagAppMaster/Tasks should check and report 
> accordingly to allow the AM to execute.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4044) Zookeeper: exclude jline from Zookeeper client from tez dist

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4044:
--
Fix Version/s: (was: 0.10.1)

> Zookeeper: exclude jline from Zookeeper client from tez dist
> 
>
> Key: TEZ-4044
> URL: https://issues.apache.org/jira/browse/TEZ-4044
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Gopal Vijayaraghavan
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4044.1.patch
>
>
> {code}
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.4.9:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> {code}
> Breaks CLI clients further down the dependency tree.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4037) Add back DAG search status KILLED

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4037:
--
Fix Version/s: (was: 0.10.1)

> Add back DAG search status KILLED 
> --
>
> Key: TEZ-4037
> URL: https://issues.apache.org/jira/browse/TEZ-4037
> Project: Apache Tez
>  Issue Type: Task
>  Components: UI
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4037.001.patch
>
>
> https://issues.apache.org/jira/browse/TEZ-2447 removed KILLED since sometimes 
> this status can fail to search all KILLED DAGs. This jira re-adds KILLED dag 
> status search since it still has value and would rather focus on fixing the 
> DAGs who fail to write killed status to history log file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4174) [Kubernetes] Fetcher should connection failure on SocketException

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4174:
--
Fix Version/s: (was: 0.10.1)

> [Kubernetes] Fetcher should connection failure on SocketException
> -
>
> Key: TEZ-4174
> URL: https://issues.apache.org/jira/browse/TEZ-4174
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-4174.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fetcher considers connection failure only when http.connect throws exception. 
> In kubernetes environment, where there can be intermediate proxies, 
> getInputStream from http connection can throw connection reset error (5xx). 
> These errors should be considered as connection failures as well.
> {code:java}
> 2020-05-08 17:03:54.080  WARN [Fetcher_B {Map_3} #3] shuffle.Fetcher: Fetch 
> Failure while connecting from 10.117.155.27 to: 10.117.154.115:25551, 
> attempt: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, 
> pathComponent=attempt_1588982534035__1_00_00_0_10030, spillType=0, 
> spillId=-1] Informing ShuffleManager:
> java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:210)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:706)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
> at 
> org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:530)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:563)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:487)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:285)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4042) Speculative attempts should avoid running on the same node

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4042:
--
Fix Version/s: (was: 0.10.1)

> Speculative attempts should avoid running on the same node
> --
>
> Key: TEZ-4042
> URL: https://issues.apache.org/jira/browse/TEZ-4042
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Ying Han
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4042.001.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4035) Tez master breaks with YARN 3.2.0 ApplicationReport API change

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4035:
--
Fix Version/s: (was: 0.10.1)

> Tez master breaks with YARN 3.2.0 ApplicationReport API change
> --
>
> Key: TEZ-4035
> URL: https://issues.apache.org/jira/browse/TEZ-4035
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Jonathan Turner Eagles
>Priority: Minor
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4035.001.patch
>
>
> {noformat}
> tez/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/NotRunningJob.java:[89,29]
>  no suitable method found for 
> newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,int,int,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,,java.lang.String,float,java.lang.String,)
> [ERROR] method 
> org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token)
>  is not applicable
> [ERROR] (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token,java.util.Set,boolean,org.apache.hadoop.yarn.api.records.Priority,java.lang.String,java.lang.String)
>  is not applicable
> [ERROR] (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token,java.util.Set,boolean,org.apache.hadoop.yarn.api.records.Priority,java.lang.String,java.lang.String)
>  is not applicable{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4021) API incompatibility wro4j-maven-plugin

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4021:
--
Fix Version/s: (was: 0.10.1)

> API incompatibility wro4j-maven-plugin
> --
>
> Key: TEZ-4021
> URL: https://issues.apache.org/jira/browse/TEZ-4021
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-4021.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3952) Allow Tez task speculation to grant greater customization of certain parameters

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3952:
--
Fix Version/s: (was: 0.10.1)

> Allow Tez task speculation to grant greater customization of certain 
> parameters
> ---
>
> Key: TEZ-3952
> URL: https://issues.apache.org/jira/browse/TEZ-3952
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Nishant Dash
>Assignee: Nishant Dash
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-3952.001.patch, TEZ-3952.002.patch, 
> TEZ-3952.003.patch, TEZ-3952.004.patch, TEZ-3952.005.patch, TEZ-3952.006.patch
>
>
> Many of the settings for Tez task speculation are hardcoded and should 
> instead be configurable. For example, there's no equivalent config settings 
> for the following MapReduce settings:
> - mapreduce.job.speculative.speculative-cap-running-tasks
> - mapreduce.job.speculative.retry-after-no-speculate
> - mapreduce.job.speculative.retry-after-speculate
> - mapreduce.job.speculative.minimum-allowed-tasks
> - mapreduce.job.speculative.speculative-cap-total-tasks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4113) TezUtils::createByteStringFromConf should use snappy instead of DeflaterOutputStream

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4113:
--
Fix Version/s: (was: 0.10.1)

> TezUtils::createByteStringFromConf should use snappy instead of 
> DeflaterOutputStream
> 
>
> Key: TEZ-4113
> URL: https://issues.apache.org/jira/browse/TEZ-4113
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Fix For: 0.10.0
>
> Attachments: Screenshot 2020-01-10 at 6.32.31 AM.png, TEZ-4113.1.patch
>
>
> Under concurrent workload, where lots of short running DAGs were submitted in 
> Hive, HS2 spikes up heavily on CPU due to 
> {{TezUtils::createByteStringFromConf}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3989) Fix by-laws related to emeritus clause

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3989:
--
Fix Version/s: (was: 0.10.1)

> Fix by-laws related to emeritus clause 
> ---
>
> Key: TEZ-3989
> URL: https://issues.apache.org/jira/browse/TEZ-3989
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Major
> Fix For: 0.10.0
>
>
> The emeritus clause is not valid and needs to be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4066) Upgrade servlet-api from 2.5 to 3.1.0

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4066:
--
Fix Version/s: (was: 0.10.1)

> Upgrade servlet-api from 2.5 to 3.1.0
> -
>
> Key: TEZ-4066
> URL: https://issues.apache.org/jira/browse/TEZ-4066
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-4066.001.patch, TEZ-4066.002.patch
>
>
> Oozie launcher jobs trying to launch Tez jobs now fail to render Oozie 
> Launcher Job AM due to both 2.5 (from tez) and 3.1.0 (from hadoop) 
> servlet-api both being in the classpath. Tez should sync with servlet api 
> version from tez master branch that only supports hadoop 3+
> {code}
> 2019-04-30 14:53:02,747 WARN [qtp1213419524-119] 
> org.eclipse.jetty.server.HttpChannel:
> java.lang.NoSuchMethodError: 
> javax.servlet.http.HttpServletRequest.isAsyncStarted()Z
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:688)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>   at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4172) Let tasks be killed after too many overall attempts

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4172:
--
Fix Version/s: (was: 0.10.1)

> Let tasks be killed after too many overall attempts
> ---
>
> Key: TEZ-4172
> URL: https://issues.apache.org/jira/browse/TEZ-4172
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 0.10.0, 0.9.3
>
> Attachments: TEZ-4172.01.patch, TEZ-4172.02.patch
>
>
> Currently, TaskImpl doesn't consider failing a task if there are too many 
> overall attempts. In case of LLAP, the number of preempted task attempts -> 
> overall task attempts [can grow in a 
> linkedhashmap|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L127].
> In an edge case, where an upstream application (Hive LLAP) cannot cope with a 
> problematic query, this can also lead to OOM in the AM, due the very high 
> number of TaskAttemptImpl objects.
> It would be beneficial to have the chance to limit the overall number of task 
> attempts, regardless of they have been failed or killed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4068) Prevent new speculative attempt after task has issued canCommit to an attempt

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4068:
--
Fix Version/s: (was: 0.10.1)

> Prevent new speculative attempt after task has issued canCommit to an attempt
> -
>
> Key: TEZ-4068
> URL: https://issues.apache.org/jira/browse/TEZ-4068
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Ying Han
>Priority: Major
> Fix For: 0.10.0, 0.9.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When a running attempt calls TaskImpl#canCommit through the taskUmbilical, 
> the TaskImpl will issue a "go" if it is the first attempt to do so. Otherwise 
> it will issue a "no-go". After commitAttempt is assigned is TaskImpl, no 
> other attempt is allowed to succeed at that point. So a speculative attempt 
> that is launched after commitAttempt is assigned can never finished before 
> the original since is will allows be given a "no-go" in the canCommit 
> response. In this jira, I propose to discuss disabling speculative attempts 
> after commitAttempt has been assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4041) TestExtServicesWithLocalMode fails in docker

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4041:
--
Fix Version/s: (was: 0.10.1)

> TestExtServicesWithLocalMode fails in docker
> 
>
> Key: TEZ-4041
> URL: https://issues.apache.org/jira/browse/TEZ-4041
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4041.001.patch
>
>
> {code}
> 2019-02-13 00:24:33,703 INFO  [DAGAppMaster Thread] service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.tez.dag.app.DAGAppMaster failed in state INITED
> org.apache.tez.dag.api.TezUncheckedException: 
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.(TaskCommunicatorManager.java:152)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1088)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:532)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2606)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2603)
>   at org.apache.tez.client.LocalClient$1.run(LocalClient.java:327)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213)
>   ... 12 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.tez.test.service.rpc.TezTestServiceProtocolProtos$SubmitWorkRequestProto$Builder.setUser(TezTestServiceProtocolProtos.java:5549)
>   at 
> org.apache.tez.dag.app.taskcomm.TezTestServiceTaskCommunicatorImpl.(TezTestServiceTaskCommunicatorImpl.java:65)
>   ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4096) SSLFactory should pickup configs from incoming conf payload

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4096:
--
Fix Version/s: (was: 0.10.1)

> SSLFactory should pickup configs from incoming conf payload
> ---
>
> Key: TEZ-4096
> URL: https://issues.apache.org/jira/browse/TEZ-4096
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-4096.1.patch, TEZ-4096.2.patch, TEZ-4096.3.patch
>
>
> SSLFactory uses "String" instead of "Path" for adding "ssl-client.xml". When 
> addResource is invoked with string, {{Configuration}} tries to find it in 
> classloader and does not load the file correctly.
> [https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/http/SSLFactory.java#L107]
> Conf: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L3064]
> This creates issue when ssl-client.xml is located in different path other 
> than the classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4047) Tez trademark in xml is causing xml parsing issue

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4047:
--
Fix Version/s: (was: 0.10.1)

> Tez trademark in xml is causing xml parsing issue
> -
>
> Key: TEZ-4047
> URL: https://issues.apache.org/jira/browse/TEZ-4047
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4047.001.patch
>
>
> {code}
> docs/src/site/site.xml:
> [Fatal Error] site.xml:97:34: The entity "reg" was referenced, but not 
> declared.
> java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
> file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: 
> 34; The entity "reg" was referenced, but not declared.
>   at 
> jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:397)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:449)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:406)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:402)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155)
>   at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
>   at com.sun.tools.script.shell.Main.evaluateString(Main.java:298)
>   at com.sun.tools.script.shell.Main.evaluateString(Main.java:319)
>   at com.sun.tools.script.shell.Main.access$300(Main.java:37)
>   at com.sun.tools.script.shell.Main$3.run(Main.java:217)
>   at com.sun.tools.script.shell.Main.main(Main.java:48)
> Caused by: org.xml.sax.SAXParseException; systemId: 
> file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: 
> 34; The entity "reg" was referenced, but not declared.
>   at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
>   at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
>   at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
>   at 
> jdk.nashorn.internal.scripts.Script$Recompilation$2$19313A$\^system_init\_.XMLDocument(:747)
>   at jdk.nashorn.internal.scripts.Script$1$\^string\_.:program(:1)
>   at 
> jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:637)
>   at 
> jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494)
>   at 
> jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393)
>   ... 10 more
> {code}
> Also output from xmllint verifies xml issue as well.
> {code}
> xmllint ./docs/src/site/site.xml
> .//src/site/site.xml:97: parser error : Entity 'reg' not defined
>  http://tez.apache.org/"/>
>  ^
> .//src/site/site.xml:123: parser error : Entity 'reg' not defined
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4204) Data race in RootInputInitializerManager

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4204:
--
Fix Version/s: (was: 0.10.1)

> Data race in RootInputInitializerManager
> 
>
> Key: TEZ-4204
> URL: https://issues.apache.org/jira/browse/TEZ-4204
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Blocker
> Fix For: 0.10.0
>
> Attachments: TEZ-4204.1.patch, TEZ-4204.1.patch, TEZ-4204.2.patch
>
>
> After https://issues.apache.org/jira/browse/TEZ-4170 there is a data race for 
> initializerMap in RootInputInitializerManager. initializerMap should be 
> initialized before vertex state is set to initializing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4156) Fix Tez to reuse IPC connections

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4156:
--
Fix Version/s: (was: 0.10.1)

> Fix Tez to reuse IPC connections
> 
>
> Key: TEZ-4156
> URL: https://issues.apache.org/jira/browse/TEZ-4156
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-4156.1.patch, TEZ-4156.2.patch, TEZ-4156.3.patch, 
> TEZ-4156.4.patch
>
>
> When tracking DAG progress, TezClientUtils ends up creating new remote user. 
> Because of this new UGI creation, IPC connections are not reused internally.
> https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java#L965
> More info from Hadoop side:
> In hadoop's IPC layer, connectionIds are checked based on 
> UserGroupInformation.
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1600
> However, UserGroupInformation comparison is based on ==
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1789



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4088) Create in-memory ifile writer for transferring smaller payloads (follow up of TEZ-4075)

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4088:
--
Fix Version/s: (was: 0.10.1)

> Create in-memory ifile writer for transferring smaller payloads (follow up of 
> TEZ-4075)
> ---
>
> Key: TEZ-4088
> URL: https://issues.apache.org/jira/browse/TEZ-4088
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-4088.1.patch, TEZ-4088.2.patch, TEZ-4088.3.patch, 
> TEZ-4088.5.patch, TEZ-4088.6.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> TEZ-4075 enabled data transfer over DME for smaller payloads. This helps in 
> reducing shuffle. 
> However, it still incurs disk IO cost (+flush) in producer side. It would be 
> good to retain smaller payloads in mem, so that disk IO costs can be saved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4040) Upgrade RoaringBitmap version to avoid NoSuchMethodError

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4040:
--
Fix Version/s: (was: 0.10.1)

> Upgrade RoaringBitmap version to avoid NoSuchMethodError
> 
>
> Key: TEZ-4040
> URL: https://issues.apache.org/jira/browse/TEZ-4040
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: 0.4.9.api.txt, 0.5.11.api.txt, 0.5.21.api.txt, 
> TEZ-4040.001.patch, TEZ-4040.002.patch
>
>
> a common request is to use the runOptimize function which is present is later 
> versions of roaringbitmap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4004:
--
Fix Version/s: (was: 0.10.1)

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4004.001-branch-0.9.patch, TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline&lang=java&l=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4206) TestSpeculation.testBasicSpeculationPerVertexConf is flaky

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4206:
--
Fix Version/s: (was: 0.10.1)

> TestSpeculation.testBasicSpeculationPerVertexConf is flaky
> --
>
> Key: TEZ-4206
> URL: https://issues.apache.org/jira/browse/TEZ-4206
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
> Fix For: 0.10.0, 0.9.3
>
> Attachments: TEZ-4206.1.patch
>
>
> Test is flaky due to timing issue in MockDAGAppMaster's clock and 
> LegacySpeculator
> [https://builds.apache.org/job/PreCommit-TEZ-Build/491/]
> [https://builds.apache.org/job/PreCommit-TEZ-Build/492/]
> [https://builds.apache.org/job/PreCommit-TEZ-Build/493/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4049) Fix findbugs issues in NotRunningJob

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4049:
--
Fix Version/s: (was: 0.10.1)

> Fix findbugs issues in NotRunningJob
> 
>
> Key: TEZ-4049
> URL: https://issues.apache.org/jira/browse/TEZ-4049
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4049.001.patch
>
>
> Introduced by TEZ-4035. Remove fixes while keeping 3.2.0 api compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4223) Adding new jars or resources after the first DAG runs does not work.

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4223:
--
Fix Version/s: (was: 0.10.1)

> Adding new jars or resources after the first DAG runs does not work.
> 
>
> Key: TEZ-4223
> URL: https://issues.apache.org/jira/browse/TEZ-4223
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
> Fix For: 0.10.0, 0.9.3
>
> Attachments: TEZ-4223.02.patch, TEZ-4223.03.patch, TEZ-4223.04.patch
>
>
> If we executed DAG which needs additional jars after the first DAG is run, we 
> get ClassNotFoundException.
>  
>  
> {noformat}
> 2020-08-03 13:57:14,776 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: 
> Added additional resources : 
> [[file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-pool-1.5.4.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/postgresql-42.2.8.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/hive-jdbc-handler-3.1.3000.7.2.2.0-73.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/mssql-jdbc-6.2.1.jre7.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-dbcp-1.4.jar]]
>  to classpath
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hive.storage.jdbc.JdbcInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
> ...
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hive.storage.jdbc.JdbcInputFormat
> at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
> at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:398)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
> ... 46 more{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3972) Tez DAG can hang when a single task fails to fetch

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3972:
--
Fix Version/s: (was: 0.10.1)

> Tez DAG can hang when a single task fails to fetch
> --
>
> Key: TEZ-3972
> URL: https://issues.apache.org/jira/browse/TEZ-3972
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-3972.001.patch, TEZ-3972.002.patch, 
> TEZ-3972.003.patch
>
>
> Description of the hung DAG:
> A DAG with 2 vertices. {{Map}} Vertex has 22k maps, downstream vertex 
> {{Reduce}} has 1009 tasks. All tasks succeed but one, which hangs. This one 
> task (attempt) is doing a local fetch from a node that (now) has a bad disk. 
> It fails to fetch and reports to the AM for the offending input attempt 
> identifiers. However the AM does not schedule a re-run as 
> {{uniquefailedOutputReports}} size is 1 (since only this task attempt failed 
> to fetch) and failure fraction is not met. The denominator for this fraction 
> is the total number of tasks. That causes the re-run to never occur. This 
> JIRA tracks the AM side of the change to alleviate this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4028) Events not visible from proto history logging for s3a filesystem until dag completes.

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4028:
--
Fix Version/s: (was: 0.10.1)

> Events not visible from proto history logging for s3a filesystem until dag 
> completes.
> -
>
> Key: TEZ-4028
> URL: https://issues.apache.org/jira/browse/TEZ-4028
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
>  Labels: history
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4028.01.patch, TEZ-4028.02.patch
>
>
> The events are not visible in the files because  s3 filesystem
> * flush writes to local disk and only upload/commit to s3 on close.
> * does not support append
> As an initial fix we log the dag submitted, initialized and started events 
> into a file and these can be read to get the dag plan, config from the AM. 
> The counters are anyways not available until the dag completes.
> The in-progress information cannot be read, this can be obtained from the AM 
> once we have the above events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-3976) Batch ShuffleManager error report events

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-3976:
--
Fix Version/s: (was: 0.10.1)

> Batch ShuffleManager error report events
> 
>
> Key: TEZ-3976
> URL: https://issues.apache.org/jira/browse/TEZ-3976
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-3976.1.patch, TEZ-3976.2.patch, TEZ-3976.3.patch, 
> TEZ-3976.4.patch, TEZ-3976.5.patch, TEZ-3976.6.patch, TEZ-3976.7.patch, 
> TEZ-3976.8.patch, TEZ-3976.9.patch
>
>
> The symptoms are a lot of these logs are being shown:
> {code:java}
> 2018-06-15T18:09:35,811 INFO  [Fetcher_B {Reducer_5} #0 ()] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: 
> Fetch failed for src: InputAttemptIdentifier [inputIdentifier=701, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, 
> spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=701, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, 
> spillId=0], connectFailed: true
> 2018-06-15T18:09:35,811 WARN  [Fetcher_B {Reducer_5} #1 ()] 
> org.apache.tez.runtime.library.common.shuffle.Fetcher: copyInputs failed for 
> tasks [InputAttemptIdentifier [inputIdentifier=589, attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0]]
> 2018-06-15T18:09:35,811 INFO  [Fetcher_B {Reducer_5} #1 ()] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: 
> Fetch failed for src: InputAttemptIdentifier [inputIdentifier=589, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=589, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0], connectFailed: true
> {code}
> Each of those translate into an event in the AM which finally crashes due to 
> OOM after around 30 minutes and around 10 million shuffle input errors (and 
> 10 million lines like the previous ones). When the ShufflerManager is closed 
> and the counters reported there are many shuffle input errors, some of those 
> logs are:
> {code:java}
> 2018-06-15T17:46:30,988  INFO [TezTR-441963_21_34_4_0_4 
> (152901963_0021_34_04_00_4)] runtime.LogicalIOProcessorRuntimeTask: 
> Final Counters for attempt_152901963_0021_34_04_00_4: Counters: 43 
> [[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=0, 
> NUM_SHUFFLED_INPUTS=26, NUM_FAILED_SHUFFLE_INPUTS=858965, 
> INPUT_RECORDS_PROCESSED=26, OUTPUT_RECORDS=1, OUTPUT_LARGE_RECORDS=0, 
> OUTPUT_BYTES=779472, OUTPUT_BYTES_WITH_OVERHEAD=779483, 
> OUTPUT_BYTES_PHYSICAL=780146, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, 
> SHUFFLE_BYTES=4207563, SHUFFLE_BYTES_DECOMPRESSED=20266603, 
> SHUFFLE_BYTES_TO_MEM=3380616, SHUFFLE_BYTES_TO_DISK=0, 
> SHUFFLE_BYTES_DISK_DIRECT=826947, SHUFFLE_PHASE_TIME=52516, 
> FIRST_EVENT_RECEIVED=1, LAST_EVENT_RECEIVED=1185][HIVE 
> RECORDS_OUT_INTERMEDIATE_^[[1;35;40m^[[KReducer_12^[[m^[[K=1, 
> RECORDS_OUT_OPERATOR_GBY_159=1, 
> RECORDS_OUT_OPERATOR_RS_160=1][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_INPUT_Map_11
>  FIRST_EVENT_RECEIVED=1, INPUT_RECORDS_PROCESSED=26, 
> LAST_EVENT_RECEIVED=1185, NUM_FAILED_SHUFFLE_INPUTS=858965, 
> NUM_SHUFFLED_INPUTS=26, SHUFFLE_BYTES=4207563, 
> SHUFFLE_BYTES_DECOMPRESSED=20266603, SHUFFLE_BYTES_DISK_DIRECT=826947, 
> SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=3380616, 
> SHUFFLE_PHASE_TIME=52516][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_OUTPUT_Map_1
>  ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779472, OUTPUT_BYTES_PHYSICAL=780146, 
> OUTPUT_BYTES_WITH_OVERHEAD=779483, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, 
> SPILLED_RECORDS=0]]
> 2018-06-15T17:46:32,271 INFO  [TezTR-441963_21_34_3_15_1 ()] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Final Counters for 
> attempt_152901963_0021_34_03_15_1: Counters: 87 [[File System 
> Counters FILE_BYTES_READ=0, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0, 
> FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=2344929, 
> HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=5, HDFS_LARGE_READ_OPS=0, 
> HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter 
> SPILLED_RECORDS=0, NUM_SHUFFLED_INPUTS=1, NUM_FAILED_SHUFFLE_INPUTS=105195, 
> INPUT_RECORDS_PROCESSED=397, INPUT_SPLIT_LENGTH_BYTES=21563271, 
> OUTPUT_RECORDS=15737, OUTPUT_LARGE_RECORDS=0, OUTPUT_BYTES=1235818, 
> OUTPUT_BYTES_WITH_OVERHEAD=1267307, OUTP

[jira] [Updated] (TEZ-4036) TestMockDAGAppMaster#testInternalPreemption should assert for failed state

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4036:
--
Fix Version/s: (was: 0.10.1)

> TestMockDAGAppMaster#testInternalPreemption should assert for failed state
> --
>
> Key: TEZ-4036
> URL: https://issues.apache.org/jira/browse/TEZ-4036
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4036.001.patch
>
>
> Due to root cause mentioned in TEZ-3950, the test fails regularly. Until the 
> fix for that JIRA is in (which is rather a good amount of redesign) , adding 
> failed assert to the test as this is now an expected state for the task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4086) Some Tez examples cannot work with outputPaths on a FS other than the default FS

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4086:
--
Fix Version/s: (was: 0.10.1)

> Some Tez examples cannot work with outputPaths on a FS other than the default 
> FS
> 
>
> Key: TEZ-4086
> URL: https://issues.apache.org/jira/browse/TEZ-4086
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-4086.01.txt, TEZ-4086.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There's several examples which make use of the FileSystem based on the 
> default config.
> This results in failure if the outputPath is on a different FileSystem. (e.g. 
> fs.defaultFS set to HDFS and outputPath for the example set to s3)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4058) Changes for 0.9.2 release

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4058:
--
Fix Version/s: (was: 0.10.1)

> Changes for 0.9.2 release
> -
>
> Key: TEZ-4058
> URL: https://issues.apache.org/jira/browse/TEZ-4058
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: TEZ-4058.001.patch
>
>
> Update Tez_DOAP.rtf, index.md etc. for 0.9.2 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-1348) Allow Tez local mode to run against filesystems other than local FS

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-1348:
--
Fix Version/s: (was: 0.10.1)

> Allow Tez local mode to run against filesystems other than local FS
> ---
>
> Key: TEZ-1348
> URL: https://issues.apache.org/jira/browse/TEZ-1348
> Project: Apache Tez
>  Issue Type: Sub-task
> Environment: Committed to branch-0.9.
>Reporter: Siddharth Seth
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: 0.9.2, 0.10.0
>
> Attachments: tez-1348.patch, tez-1348.patch, tez-1348.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In TEZ-717, I incorrect thought setting fs.defaultFS programmatically in 
> tez-site would work for local mode.
> Currently the requirement is that tez-site.xml must have fs.defaultFS set to 
> file:///.
> While that works, it doesn't allow for seamless execution in either 
> local-mode or on a cluster.
> The main issue here is that when Inputs / Outputs are configured - they use a 
> version of configuration which reads tez-site, and do not use the 
> configuration from the client itself (which is correct behaviour).
> Not sure what a good way to fix this is 
> 1) It may be possible to override this value each time an instance of 
> Configuration/TezConfiguration is created. One possible way would be to 
> statically add a default resource to Configuration the moment a local client 
> is created.
> 2) Provide information in the contexts on whether this is local or not. This 
> is fairly ugly, and would get in the way of running mixed mode tasks.
> Anyone have other suggestions ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TEZ-4012) Add docker support for Tez.

2020-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4012:
--
Fix Version/s: (was: 0.10.1)

> Add docker support for Tez.
> ---
>
> Key: TEZ-4012
> URL: https://issues.apache.org/jira/browse/TEZ-4012
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4012.001.patch, TEZ-4012.002.patch, 
> TEZ-4012.003.patch
>
>
> Hadoop label builds contain a mix of development tools and versions. In 
> particular H11-H20 are unusable by tez since protoc -version is 2.6.x and 
> hadoop only supports 2.5.0. This jira will allow builds across all H1-H20 
> jenkins machines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >