[jira] [Commented] (TEZ-3992) Update commons-codec from 1.4 to 1.11

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975426#comment-16975426
 ] 

Jonathan Turner Eagles commented on TEZ-3992:
-

+1. [~abstractdog], now that hadoop 3.0 has been EOL'd, it's safe to upgrade 
this dependency for the master branch. To that end we also need to bump our 
master branch dependency on hadoop 3.1+. Thanks for this contribution.

> Update commons-codec from 1.4 to 1.11
> -
>
> Key: TEZ-3992
> URL: https://issues.apache.org/jira/browse/TEZ-3992
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 0.10.1
>
> Attachments: TEZ-3992.01.patch
>
>
> Commons codec 1.4 is from 2009, maybe we should try an update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (TEZ-3992) Update commons-codec from 1.4 to 1.11

2019-11-15 Thread Jonathan Turner Eagles (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Turner Eagles reassigned TEZ-3992:
---

Assignee: László Bodor

> Update commons-codec from 1.4 to 1.11
> -
>
> Key: TEZ-3992
> URL: https://issues.apache.org/jira/browse/TEZ-3992
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 0.10.1
>
> Attachments: TEZ-3992.01.patch
>
>
> Commons codec 1.4 is from 2009, maybe we should try an update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (TEZ-4067) Tez Speculation decision is calculated on each update by the dispatcher

2019-11-15 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975416#comment-16975416
 ] 

Ahmed Hussein edited comment on TEZ-4067 at 11/15/19 9:34 PM:
--

Thanks Jon!
Sure, I will change that and create a new patch.


was (Author: ahussein):
Thanks Jon!Sure, I will change that and create a new patch.

> Tez Speculation decision is calculated on each update by the dispatcher
> ---
>
> Key: TEZ-4067
> URL: https://issues.apache.org/jira/browse/TEZ-4067
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch, 
> TEZ-4067.003.patch, TEZ-4067.004.patch
>
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are 
> handled synchronously by the caller (dispatcher). This implies the following:
>  # the dispatcher spends long time executing updateStatus as it needs to 
> check the runtime estimation of the tezAttempts within the vertex.
>  # the speculator is per stage: lunching a speculation may not the optimum 
> decision. Ideally, based on resources, speculated tasks should be the ones 
> with slowest progress.
>  # the time between speculation is skewed because there is a big delay for 
> the dispatcher to complete a full cycle. Also, speculation will be more 
> aggressive compared to MR because MR waits for 
> "soonest.retry.after.speculate" whenever a task is speculated. On the other 
> hand, Tez speculates more tasks as it processes stages in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4067) Tez Speculation decision is calculated on each update by the dispatcher

2019-11-15 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975416#comment-16975416
 ] 

Ahmed Hussein commented on TEZ-4067:


Thanks Jon!Sure, I will change that and create a new patch.

> Tez Speculation decision is calculated on each update by the dispatcher
> ---
>
> Key: TEZ-4067
> URL: https://issues.apache.org/jira/browse/TEZ-4067
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch, 
> TEZ-4067.003.patch, TEZ-4067.004.patch
>
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are 
> handled synchronously by the caller (dispatcher). This implies the following:
>  # the dispatcher spends long time executing updateStatus as it needs to 
> check the runtime estimation of the tezAttempts within the vertex.
>  # the speculator is per stage: lunching a speculation may not the optimum 
> decision. Ideally, based on resources, speculated tasks should be the ones 
> with slowest progress.
>  # the time between speculation is skewed because there is a big delay for 
> the dispatcher to complete a full cycle. Also, speculation will be more 
> aggressive compared to MR because MR waits for 
> "soonest.retry.after.speculate" whenever a task is speculated. On the other 
> hand, Tez speculates more tasks as it processes stages in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4083) Upgrade to latest 9.4.x Jetty version

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975407#comment-16975407
 ] 

Jonathan Turner Eagles commented on TEZ-4083:
-

[~abstractdog], Does it make sense to upgrade to 9.3.27.v20190418 instead. It 
also fixes the CVE, but doesn't have the API changes. And then when 3.3.0 is 
supported, we can upgrade to 9.4.x

> Upgrade to latest 9.4.x Jetty version
> -
>
> Key: TEZ-4083
> URL: https://issues.apache.org/jira/browse/TEZ-4083
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Daniel Velasquez
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-4083.01.patch
>
>
> Jetty 9.3.24.v20180605 has security vulnerabilities where the server is 
> vulnerable to XSS conditions.
> [https://www.cvedetails.com/cve/CVE-2019-10241/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4084) Tez local mode fails when distributed cache creates link with parent

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975391#comment-16975391
 ] 

Jonathan Turner Eagles commented on TEZ-4084:
-

Thanks [~jtolar]. +1. Committed this change to master and branch-0.9

> Tez local mode fails when distributed cache creates link with parent
> 
>
> Key: TEZ-4084
> URL: https://issues.apache.org/jira/browse/TEZ-4084
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Jacob Tolar
>Assignee: Jacob Tolar
>Priority: Minor
> Attachments: TEZ-4084.1.patch, TEZ-4084.2.patch, TEZ-4084.3.patch
>
>
> If you configure distributed cache with 'target#parent/link' in Tez Local 
> Mode, the file cannot be created and the job fails:
> {code:java}
> java.lang.IllegalArgumentException: Invalid prefix or suffix
>   at java.nio.file.TempFileHelper.generatePath(TempFileHelper.java:63)
>   at java.nio.file.TempFileHelper.create(TempFileHelper.java:127)
>   at 
> java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:173)
>   at java.nio.file.Files.createTempDirectory(Files.java:950)
>   at 
> org.apache.tez.dag.app.launcher.TezLocalCacheManager.localize(TezLocalCacheManager.java:103)
>  {code}
>  
> I propose: 
>  # Ensure the prefix is always valid (e.g. no path separators in it) when 
> creating the temporary copy of the file in TezLocalCacheManager
>  # Update tez local mode to behave the same way as mapreduce local mode in 
> this scenario. Mapreduce local mode also doesn't support these types of links 
> (links with a parent directory specified), but if it encounters them it is a 
> soft failure (WARN log message) not a job failure.
>  
> It is somewhat trickier to correctly support cache files linked with a 
> nonexistent parent; if that feature is required it can be done as a separate 
> JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4092) Elapsed time displayed in tez-ui is in format(UU.dd units e.g 1.24 hours) which is different from RM’s display format(D days, H hours, …)

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975373#comment-16975373
 ] 

Jonathan Turner Eagles commented on TEZ-4092:
-

The tez ui was designed differently that the RM and is consistent with the 
spark history server as well. Do you have some use cases that are needed to 
support this change? Currently. I'm see this a a purely an aesthetic change. 

Not promoting this book, but it does have a relevant example of the spark 
history server as well.
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-history-server.html

> Elapsed time displayed in tez-ui is in format(UU.dd units e.g 1.24 hours) 
> which is different from RM’s display format(D days, H hours, …)
> -
>
> Key: TEZ-4092
> URL: https://issues.apache.org/jira/browse/TEZ-4092
> Project: Apache Tez
>  Issue Type: Improvement
>  Components: UI
>Reporter: Pulkit Sharma
>Priority: Major
> Attachments: TEZ-4092.01.patch, TEZ-4092.01.patch, 
> TEZ-4092.03-branch-0.8.patch
>
>
> *Issue:* Elapsed time displayed in tez-ui is in format(UU.dd units e.g 1.24 
> hours) which is different from RM’s display format(D days, H hours, …). This 
> is very confusing and RM's display format is widely used/accepted format. 
> This is also fixed in higher versions of tez( 0.9 onwards). 
> *Solution:* Making time format in tez-ui same as RM’s format(i.e D days, H 
> hours, …)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4085) Tez UI resources vendor.js and tez-ui.js not getting minified in tez releases

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975327#comment-16975327
 ] 

Jonathan Turner Eagles commented on TEZ-4085:
-

+1. Checking this patch into master branch and branch-0.9.

> Tez UI resources vendor.js and tez-ui.js not getting minified in tez releases
> -
>
> Key: TEZ-4085
> URL: https://issues.apache.org/jira/browse/TEZ-4085
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Himanshu Mishra
>Priority: Minor
> Attachments: TEZ-4085.1.patch
>
>
> One of the steps to obtain {{tez-ui.war}} is from maven repo or releases page 
> at [https://tez.apache.org/releases/]
> tez-ui.war as part of release does not minify vendor.js, tez-ui.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (TEZ-4085) Tez UI resources vendor.js and tez-ui.js not getting minified in tez releases

2019-11-15 Thread Jonathan Turner Eagles (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Turner Eagles reassigned TEZ-4085:
---

Assignee: Himanshu Mishra

> Tez UI resources vendor.js and tez-ui.js not getting minified in tez releases
> -
>
> Key: TEZ-4085
> URL: https://issues.apache.org/jira/browse/TEZ-4085
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Himanshu Mishra
>Assignee: Himanshu Mishra
>Priority: Minor
> Attachments: TEZ-4085.1.patch
>
>
> One of the steps to obtain {{tez-ui.war}} is from maven repo or releases page 
> at [https://tez.apache.org/releases/]
> tez-ui.war as part of release does not minify vendor.js, tez-ui.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-2442) Support DFS based shuffle in addition to HTTP shuffle

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975248#comment-16975248
 ] 

Jonathan Turner Eagles commented on TEZ-2442:
-

[~shanyu], are you still working on this? [~ganeshas] is interested in 
contributing to this feature but wants to make sure. If you can reply in the 
next week, we'll be sure to have permissions before starting work.

> Support DFS based shuffle in addition to HTTP shuffle
> -
>
> Key: TEZ-2442
> URL: https://issues.apache.org/jira/browse/TEZ-2442
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.3
>Reporter: Kannan Rajah
>Assignee: shanyu zhao
>Priority: Major
> Attachments: FS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, 
> hdfs_broadcast_hack.txt, tez-2442-trunk.2.patch, tez-2442-trunk.3.patch, 
> tez-2442-trunk.4.patch, tez-2442-trunk.5.patch, tez-2442-trunk.patch, 
> tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared 
> between stages. Shuffle data is written to local disk and fetched from any 
> remote node using HTTP. A DFS like MapR file system can support writing this 
> shuffle data directly to its DFS using a notion of local volumes and retrieve 
> it using HDFS API from remote node. The current Shuffle implementation 
> assumes local data can only be managed by LocalFileSystem. So it uses 
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption 
> and introduce an abstraction to manage local disks, then we can reuse most of 
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead 
> of HTTP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4089) Upgrade to hadoop 2.8.4 to make 0.9.x releases work with Hive 3.x

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975244#comment-16975244
 ] 

Jonathan Turner Eagles commented on TEZ-4089:
-

[~kgyrtkirk], I've updated the documentation. Please verify the changes. After 
the next release the minimal tarball will become available.

> Upgrade to hadoop 2.8.4 to make 0.9.x releases work with Hive 3.x
> -
>
> Key: TEZ-4089
> URL: https://issues.apache.org/jira/browse/TEZ-4089
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
> Attachments: TEZ-4089.branch-0.9.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HADOOP-12209 have changed slightly the api of the FileStatus class - and 
> because of that applications compiled against hadoop >=2.8 can't run with tez 
> 0.9.x because of some exceptions arising from that.
> Right now I think its not possible to deploy an upstream hive release (which 
> would use tez) without rebuilding tez along the way.
> http://mail-archives.apache.org/mod_mbox/tez-dev/201910.mbox/%3C0879b9ec-f334-2349-ca71-90ad053b20cd%40rxd.hu%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3860) JDK9: ReflectionUtils may not use URLClassLoader

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975216#comment-16975216
 ] 

Jonathan Turner Eagles commented on TEZ-3860:
-

[~abstractdog], I'm trying to understand the findbugs warning. Is there a use 
case where this is important? Otherwise we should list the exception for the 
findbugs in the exceptions file.

> JDK9: ReflectionUtils may not use URLClassLoader
> 
>
> Key: TEZ-3860
> URL: https://issues.apache.org/jira/browse/TEZ-3860
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-3860.01.patch, TEZ-3860.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following code
> https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/common/ReflectionUtils.java#L125
> is not compatible with JDK9 since the classloader is an AppClassLoader
> causes exceptions like this:
> {code}
> java.lang.ClassCastException: 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to 
> java.base/java.net.URLClassLoader
>   at 
> org.apache.tez.common.ReflectionUtils.addResourcesToSystemClassLoader(ReflectionUtils.java:125)
>  ~[tez-api-0.9.0.jar:0.9.0]
>   at 
> org.apache.tez.dag.utils.RelocalizationUtils.addUrlsToClassPath(RelocalizationUtils.java:57)
>  ~[tez-common-0.9.0.jar:0.9.0]
>   at 
> org.apache.tez.dag.app.dag.impl.DAGImpl$StartTransition.transition(DAGImpl.java:1793)
>  ~[tez-dag-0.9.0.jar:0.9.0]
>   at 
> org.apache.tez.dag.app.dag.impl.DAGImpl$StartTransition.transition(DAGImpl.java:1776)
>  ~[tez-dag-0.9.0.jar:0.9.0]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>  ~[hadoop-yarn-common-2.8.1.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.8.1.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.8.1.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.8.1.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59) 
> ~[tez-dag-0.9.0.jar:0.9.0]
>   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1156) 
> [tez-dag-0.9.0.jar:0.9.0]
>   at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:147) 
> [tez-dag-0.9.0.jar:0.9.0]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
>  [tez-dag-0.9.0.jar:0.9.0]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
>  [tez-dag-0.9.0.jar:0.9.0]
>   at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180) 
> [tez-common-0.9.0.jar:0.9.0]
>   at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115) 
> [tez-common-0.9.0.jar:0.9.0]
>   at java.base/java.lang.Thread.run(Thread.java:844) [?:?]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4093) Add hadoop-ozone-filesystem jar to ozone profile

2019-11-15 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975210#comment-16975210
 ] 

Jonathan Turner Eagles commented on TEZ-4093:
-

It makes little sense for tez to add this dependency as tez has no direct 
dependency on this. The best practice is for distributions to make this change 
or to rely on the hadoop distribution dependencies to provide this.

> Add hadoop-ozone-filesystem jar to ozone profile
> 
>
> Key: TEZ-4093
> URL: https://issues.apache.org/jira/browse/TEZ-4093
> Project: Apache Tez
>  Issue Type: Task
>Affects Versions: 0.9.2
>Reporter: Vivek Ratnavel Subramanian
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Tez should include Ozone filesystem jar with a new profile "ozone" in mvn.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)