[jira] [Created] (TEZ-4224) Add Laszlo Bodor's public key to KEYS
László Bodor created TEZ-4224: - Summary: Add Laszlo Bodor's public key to KEYS Key: TEZ-4224 URL: https://issues.apache.org/jira/browse/TEZ-4224 Project: Apache Tez Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (TEZ-4224) Add Laszlo Bodor's public key to KEYS
[ https://issues.apache.org/jira/browse/TEZ-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned TEZ-4224: - Assignee: László Bodor > Add Laszlo Bodor's public key to KEYS > - > > Key: TEZ-4224 > URL: https://issues.apache.org/jira/browse/TEZ-4224 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (TEZ-3645) Reuse SerializationFactory while sorting, merging, and writing IFiles
[ https://issues.apache.org/jira/browse/TEZ-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183937#comment-17183937 ] László Bodor edited comment on TEZ-3645 at 8/25/20, 11:08 AM: -- [~jeagles]: this is marked az 0.10 blocker, is there anything I can help with this patch? (this week I can address the latest comments by a new patch if it helps) was (Author: abstractdog): this is marked az 0.10 blocker, is there anything I can help with this patch? (this week I can address the latest comments if it helps) > Reuse SerializationFactory while sorting, merging, and writing IFiles > -- > > Key: TEZ-3645 > URL: https://issues.apache.org/jira/browse/TEZ-3645 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Labels: 0.10_blocker > Attachments: TEZ-3645.003.patch, TEZ-3645.004.patch, > TEZ-3645.1.patch, TEZ-3645.2.patch > > > Of course this is not reusing the serializer, just the SerializationFactory > and Serialization. They are jointly responsible for iterating over the list > of available serializers and finding an acceptable one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3645) Reuse SerializationFactory while sorting, merging, and writing IFiles
[ https://issues.apache.org/jira/browse/TEZ-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183937#comment-17183937 ] László Bodor commented on TEZ-3645: --- this is marked az 0.10 blocker, is there anything I can help with this patch? (this week I can address the latest comments if it helps) > Reuse SerializationFactory while sorting, merging, and writing IFiles > -- > > Key: TEZ-3645 > URL: https://issues.apache.org/jira/browse/TEZ-3645 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Labels: 0.10_blocker > Attachments: TEZ-3645.003.patch, TEZ-3645.004.patch, > TEZ-3645.1.patch, TEZ-3645.2.patch > > > Of course this is not reusing the serializer, just the SerializationFactory > and Serialization. They are jointly responsible for iterating over the list > of available serializers and finding an acceptable one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4224) Add Laszlo Bodor's public key to KEYS
[ https://issues.apache.org/jira/browse/TEZ-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4224: -- Attachment: TEZ-4224.01.patch > Add Laszlo Bodor's public key to KEYS > - > > Key: TEZ-4224 > URL: https://issues.apache.org/jira/browse/TEZ-4224 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: TEZ-4224.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-4224) Add Laszlo Bodor's public key to KEYS
[ https://issues.apache.org/jira/browse/TEZ-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183945#comment-17183945 ] László Bodor commented on TEZ-4224: --- [~jeagles]: could you please take a look? > Add Laszlo Bodor's public key to KEYS > - > > Key: TEZ-4224 > URL: https://issues.apache.org/jira/browse/TEZ-4224 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: TEZ-4224.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-4213) Bound appContext executor capacity using a configurable property
[ https://issues.apache.org/jira/browse/TEZ-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183969#comment-17183969 ] László Bodor commented on TEZ-4213: --- forgot to double-check checkstyle warning, fixed in addendum commit: https://github.com/apache/tez/commit/99895f9808170ce64fd1e7c6dfb2e932f4578489 > Bound appContext executor capacity using a configurable property > > > Key: TEZ-4213 > URL: https://issues.apache.org/jira/browse/TEZ-4213 > Project: Apache Tez > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: 0.10_blocker > Fix For: 0.10.1 > > Attachments: TEZ-4213.01.patch, TEZ-4213.02.patch, TEZ-4213.03.patch, > TEZ-4213.04.patch, TEZ-4213.05.patch, TEZ-4213.06.patch, TEZ-4213.07.patch, > TEZ-4213.08.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > After TEZ-4170 was merged, appContext executor pool is also used by the > RootInputInitializerManager to speed up SplitGeneration. > However, this executor pool currently has not capacity limit > https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L624 > The problem the occurs when generating splits for larger inputs (thousands or > more) is that it can could result to > {color:red}java.lang.OutOfMemoryError{color} > that is also reproducible with a test case. > https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/RootInputInitializerManager.java#L130 > To avoid such errors, I propose to limit the capacity of this pool to a > configurable value that can be for example the number of physical cores by > default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3645) Reuse SerializationFactory while sorting, merging, and writing IFiles
[ https://issues.apache.org/jira/browse/TEZ-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3645: -- Attachment: TEZ-3645.005.patch > Reuse SerializationFactory while sorting, merging, and writing IFiles > -- > > Key: TEZ-3645 > URL: https://issues.apache.org/jira/browse/TEZ-3645 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Labels: 0.10_blocker > Attachments: TEZ-3645.003.patch, TEZ-3645.004.patch, > TEZ-3645.005.patch, TEZ-3645.1.patch, TEZ-3645.2.patch > > > Of course this is not reusing the serializer, just the SerializationFactory > and Serialization. They are jointly responsible for iterating over the list > of available serializers and finding an acceptable one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3645) Reuse SerializationFactory while sorting, merging, and writing IFiles
[ https://issues.apache.org/jira/browse/TEZ-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184056#comment-17184056 ] László Bodor commented on TEZ-3645: --- uploaded [^TEZ-3645.005.patch] with the changes: introduced SerializationContext and init it in MergeManager's constructor > Reuse SerializationFactory while sorting, merging, and writing IFiles > -- > > Key: TEZ-3645 > URL: https://issues.apache.org/jira/browse/TEZ-3645 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Labels: 0.10_blocker > Attachments: TEZ-3645.003.patch, TEZ-3645.004.patch, > TEZ-3645.005.patch, TEZ-3645.1.patch, TEZ-3645.2.patch > > > Of course this is not reusing the serializer, just the SerializationFactory > and Serialization. They are jointly responsible for iterating over the list > of available serializers and finding an acceptable one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-4224) Add Laszlo Bodor's public key to KEYS
[ https://issues.apache.org/jira/browse/TEZ-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184093#comment-17184093 ] Jonathan Turner Eagles commented on TEZ-4224: - Can you post a link to the public key server used to verify this public keys? Also, be careful to never lose the private key. Keep a backup of the key in a secure location with backup. If the key is compromised, we can go through the process of updating the key. > Add Laszlo Bodor's public key to KEYS > - > > Key: TEZ-4224 > URL: https://issues.apache.org/jira/browse/TEZ-4224 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: TEZ-4224.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4043) Create a yetus compatible checkstyle configuration
[ https://issues.apache.org/jira/browse/TEZ-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4043: -- Fix Version/s: 0.10.0 > Create a yetus compatible checkstyle configuration > -- > > Key: TEZ-4043 > URL: https://issues.apache.org/jira/browse/TEZ-4043 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4043.001.patch, TEZ-4043.002.patch > > > Tez follows Hadoop source code guidelines with the exception of 120 character > line length. > http://maven.apache.org/plugins/maven-checkstyle-plugin/examples/multi-module-config.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3994) Upgrade maven-surefire-plugin to 0.21.0 to support yetus
[ https://issues.apache.org/jira/browse/TEZ-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3994: -- Fix Version/s: 0.10.0 > Upgrade maven-surefire-plugin to 0.21.0 to support yetus > > > Key: TEZ-3994 > URL: https://issues.apache.org/jira/browse/TEZ-3994 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-3994.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4022) Upgrade Maven Surefire plugin to 3.0.0-M1
[ https://issues.apache.org/jira/browse/TEZ-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4022: -- Fix Version/s: 0.10.0 > Upgrade Maven Surefire plugin to 3.0.0-M1 > - > > Key: TEZ-4022 > URL: https://issues.apache.org/jira/browse/TEZ-4022 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4022.001.patch > > > Recently all the unit tests are failing. This is caused by the latest Java 8 > issue reported at SUREFIRE-1588 and fixed in Maven Surefire plugin 3.0.0-M1. > We need to update the plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3957) Report TASK_DURATION_MILLIS as a Counter for completed tasks
[ https://issues.apache.org/jira/browse/TEZ-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3957: -- Fix Version/s: 0.10.0 > Report TASK_DURATION_MILLIS as a Counter for completed tasks > > > Key: TEZ-3957 > URL: https://issues.apache.org/jira/browse/TEZ-3957 > Project: Apache Tez > Issue Type: Improvement >Reporter: Eric Wohlstadter >Assignee: Sergey Shelukhin >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-3957.01.patch, TEZ-3957.02.patch, TEZ-3957.02.patch, > TEZ-3957.03.patch, TEZ-3957.patch > > > timeTaken is already being reported by {{TaskAttemptFinishedEvent}}, but not > as a Counter. > Combined with TEZ-3911, this provides min(timeTaken), max(timeTaken), > avg(timeTaken). > The value will be: {{finishTime - launchTime}} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4028) Events not visible from proto history logging for s3a filesystem until dag completes.
[ https://issues.apache.org/jira/browse/TEZ-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4028: -- Fix Version/s: 0.10.0 > Events not visible from proto history logging for s3a filesystem until dag > completes. > - > > Key: TEZ-4028 > URL: https://issues.apache.org/jira/browse/TEZ-4028 > Project: Apache Tez > Issue Type: Bug >Reporter: Harish JP >Assignee: Harish JP >Priority: Major > Labels: history > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4028.01.patch, TEZ-4028.02.patch > > > The events are not visible in the files because s3 filesystem > * flush writes to local disk and only upload/commit to s3 on close. > * does not support append > As an initial fix we log the dag submitted, initialized and started events > into a file and these can be read to get the dag plan, config from the AM. > The counters are anyways not available until the dag completes. > The in-progress information cannot be read, this can be obtained from the AM > once we have the above events. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3975) Please add OWASP Dependency Check to the build (pom.xml)
[ https://issues.apache.org/jira/browse/TEZ-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3975: -- Fix Version/s: 0.10.0 > Please add OWASP Dependency Check to the build (pom.xml) > > > Key: TEZ-3975 > URL: https://issues.apache.org/jira/browse/TEZ-3975 > Project: Apache Tez > Issue Type: New Feature >Affects Versions: 0.8.next, 0.10.0, 0.10.1 > Environment: All development, build, test, environments. >Reporter: Albert Baker >Assignee: Jonathan Turner Eagles >Priority: Major > Labels: build, easy-fix, security > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-3975.001.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Please add OWASP Dependency Check to the build (pom.xml). OWASP DC makes an > outbound REST call to MITRE Common Vulnerabilities & Exposures (CVE) to > perform a lookup for each dependant .jar to list any/all known > vulnerabilities for each jar. This step is needed because a manual MITRE CVE > lookup/check on the main component does not include checking for > vulnerabilities in components or in dependant libraries. > OWASP Dependency check : > https://www.owasp.org/index.php/OWASP_Dependency_Check has plug-ins for most > Java build/make types (ant, maven, ivy, gradle). > Also, add the appropriate command to the nightly build to generate a report > of all known vulnerabilities in any/all third party libraries/dependencies > that get pulled in. example : mvn -Powasp -Dtest=false -DfailIfNoTests=false > clean aggregate > Generating this report nightly/weekly will help inform the project's > development team if any dependant libraries have a reported known > vulnerailities. Project teams that keep up with removing vulnerabilities on > a weekly basis will help protect businesses that rely on these open source > componets. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4058) Changes for 0.9.2 release
[ https://issues.apache.org/jira/browse/TEZ-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4058: -- Fix Version/s: 0.10.0 > Changes for 0.9.2 release > - > > Key: TEZ-4058 > URL: https://issues.apache.org/jira/browse/TEZ-4058 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4058.001.patch > > > Update Tez_DOAP.rtf, index.md etc. for 0.9.2 release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4223) Adding new jars or resources after the first DAG runs does not work.
[ https://issues.apache.org/jira/browse/TEZ-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4223: -- Fix Version/s: 0.10.0 > Adding new jars or resources after the first DAG runs does not work. > > > Key: TEZ-4223 > URL: https://issues.apache.org/jira/browse/TEZ-4223 > Project: Apache Tez > Issue Type: Bug >Reporter: Harish JP >Assignee: Harish JP >Priority: Major > Fix For: 0.10.0, 0.10.1, 0.9.3 > > Attachments: TEZ-4223.02.patch, TEZ-4223.03.patch, TEZ-4223.04.patch > > > If we executed DAG which needs additional jars after the first DAG is run, we > get ClassNotFoundException. > > > {noformat} > 2020-08-03 13:57:14,776 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: > Added additional resources : > [[file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-pool-1.5.4.jar, > > file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/postgresql-42.2.8.jar, > > file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/hive-jdbc-handler-3.1.3000.7.2.2.0-73.jar, > > file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/mssql-jdbc-6.2.1.jre7.jar, > > file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-dbcp-1.4.jar]] > to classpath > org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find > class: org.apache.hive.storage.jdbc.JdbcInputFormat > Serialization trace: > inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc) > aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > ... > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.hive.storage.jdbc.JdbcInputFormat > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 46 more{noformat} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4042) Speculative attempts should avoid running on the same node
[ https://issues.apache.org/jira/browse/TEZ-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4042: -- Fix Version/s: 0.10.0 > Speculative attempts should avoid running on the same node > -- > > Key: TEZ-4042 > URL: https://issues.apache.org/jira/browse/TEZ-4042 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Ying Han >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4042.001.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4115) turn on data-via-events as default
[ https://issues.apache.org/jira/browse/TEZ-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4115: -- Fix Version/s: 0.10.0 > turn on data-via-events as default > -- > > Key: TEZ-4115 > URL: https://issues.apache.org/jira/browse/TEZ-4115 > Project: Apache Tez > Issue Type: Bug >Reporter: Richard Zhang >Assignee: Richard Zhang >Priority: Minor > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4115.1.patch > > > tez.runtime.transfer.data-via-events.enabled will be enabled as true by > default -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4047) Tez trademark in xml is causing xml parsing issue
[ https://issues.apache.org/jira/browse/TEZ-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4047: -- Fix Version/s: 0.10.0 > Tez trademark in xml is causing xml parsing issue > - > > Key: TEZ-4047 > URL: https://issues.apache.org/jira/browse/TEZ-4047 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4047.001.patch > > > {code} > docs/src/site/site.xml: > [Fatal Error] site.xml:97:34: The entity "reg" was referenced, but not > declared. > java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: > file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: > 34; The entity "reg" was referenced, but not declared. > at > jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:397) > at > jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:449) > at > jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:406) > at > jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:402) > at > jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155) > at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264) > at com.sun.tools.script.shell.Main.evaluateString(Main.java:298) > at com.sun.tools.script.shell.Main.evaluateString(Main.java:319) > at com.sun.tools.script.shell.Main.access$300(Main.java:37) > at com.sun.tools.script.shell.Main$3.run(Main.java:217) > at com.sun.tools.script.shell.Main.main(Main.java:48) > Caused by: org.xml.sax.SAXParseException; systemId: > file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: > 34; The entity "reg" was referenced, but not declared. > at > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) > at > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205) > at > jdk.nashorn.internal.scripts.Script$Recompilation$2$19313A$\^system_init\_.XMLDocument(:747) > at jdk.nashorn.internal.scripts.Script$1$\^string\_.:program(:1) > at > jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:637) > at > jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494) > at > jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393) > ... 10 more > {code} > Also output from xmllint verifies xml issue as well. > {code} > xmllint ./docs/src/site/site.xml > .//src/site/site.xml:97: parser error : Entity 'reg' not defined > http://tez.apache.org/"/> > ^ > .//src/site/site.xml:123: parser error : Entity 'reg' not defined > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4076) Add hadoop-cloud-storage jar to aws and azure mvn profiles
[ https://issues.apache.org/jira/browse/TEZ-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4076: -- Fix Version/s: 0.10.0 > Add hadoop-cloud-storage jar to aws and azure mvn profiles > -- > > Key: TEZ-4076 > URL: https://issues.apache.org/jira/browse/TEZ-4076 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4076.patch > > Time Spent: 10m > Remaining Estimate: 0h > > It would make sense to include the dependencies in the > {{hadoop-cloud-storage}} jar file when choosing aws or azure profiles. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3952) Allow Tez task speculation to grant greater customization of certain parameters
[ https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3952: -- Fix Version/s: 0.10.0 > Allow Tez task speculation to grant greater customization of certain > parameters > --- > > Key: TEZ-3952 > URL: https://issues.apache.org/jira/browse/TEZ-3952 > Project: Apache Tez > Issue Type: Improvement >Reporter: Nishant Dash >Assignee: Nishant Dash >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-3952.001.patch, TEZ-3952.002.patch, > TEZ-3952.003.patch, TEZ-3952.004.patch, TEZ-3952.005.patch, TEZ-3952.006.patch > > > Many of the settings for Tez task speculation are hardcoded and should > instead be configurable. For example, there's no equivalent config settings > for the following MapReduce settings: > - mapreduce.job.speculative.speculative-cap-running-tasks > - mapreduce.job.speculative.retry-after-no-speculate > - mapreduce.job.speculative.retry-after-speculate > - mapreduce.job.speculative.minimum-allowed-tasks > - mapreduce.job.speculative.speculative-cap-total-tasks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4146) Register RUNNING state in DAG's state change callback
[ https://issues.apache.org/jira/browse/TEZ-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4146: -- Fix Version/s: 0.10.0 > Register RUNNING state in DAG's state change callback > - > > Key: TEZ-4146 > URL: https://issues.apache.org/jira/browse/TEZ-4146 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 0.10.0, 0.10.1, 0.9.3 > > Attachments: TEZ-4146.1.patch, TEZ-4146.2.patch, TEZ-4146.3.patch > > > It would be good to register RUNNING in the DAG state change callbacks. This > would help applications like Hive, when it [monitors the > job|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java#L182] > continuously for getting runtime breakdown at the end of the job.. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4052) Fit dot files ASF License issues - part 2
[ https://issues.apache.org/jira/browse/TEZ-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4052: -- Fix Version/s: 0.10.0 > Fit dot files ASF License issues - part 2 > - > > Key: TEZ-4052 > URL: https://issues.apache.org/jira/browse/TEZ-4052 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4052.001.patch > > > Continuing the effort in TEZ-3995. > https://issues.apache.org/jira/browse/TEZ-3995?focusedCommentId=16784595&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16784595 > {code} > 1) Please extend this to tez-ext-service-tests 2) Also, please consider > directory tez.log.dir with path ${project.build.directory}/logs. > {code} > This jira is to making sure all dot files are correctly placed under target > directory as to 1) make sure file aren't created outside the build directory > and 2) and named as part of a broader test directory design -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4040) Upgrade RoaringBitmap version to avoid NoSuchMethodError
[ https://issues.apache.org/jira/browse/TEZ-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4040: -- Fix Version/s: 0.10.0 > Upgrade RoaringBitmap version to avoid NoSuchMethodError > > > Key: TEZ-4040 > URL: https://issues.apache.org/jira/browse/TEZ-4040 > Project: Apache Tez > Issue Type: Task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: 0.4.9.api.txt, 0.5.11.api.txt, 0.5.21.api.txt, > TEZ-4040.001.patch, TEZ-4040.002.patch > > > a common request is to use the runOptimize function which is present is later > versions of roaringbitmap -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4096) SSLFactory should pickup configs from incoming conf payload
[ https://issues.apache.org/jira/browse/TEZ-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4096: -- Fix Version/s: 0.10.0 > SSLFactory should pickup configs from incoming conf payload > --- > > Key: TEZ-4096 > URL: https://issues.apache.org/jira/browse/TEZ-4096 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4096.1.patch, TEZ-4096.2.patch, TEZ-4096.3.patch > > > SSLFactory uses "String" instead of "Path" for adding "ssl-client.xml". When > addResource is invoked with string, {{Configuration}} tries to find it in > classloader and does not load the file correctly. > [https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/http/SSLFactory.java#L107] > Conf: > [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L3064] > This creates issue when ssl-client.xml is located in different path other > than the classpath. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4091) UnorderedPartitionedKVWriter::readDataForDME should check if in-mem file is flushed or not
[ https://issues.apache.org/jira/browse/TEZ-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4091: -- Fix Version/s: 0.10.0 > UnorderedPartitionedKVWriter::readDataForDME should check if in-mem file is > flushed or not > -- > > Key: TEZ-4091 > URL: https://issues.apache.org/jira/browse/TEZ-4091 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4091.1.patch, TEZ-4091.2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > It is possible that the in-mem cache flushed out the data to file. Before > sending the data over wire, it would be good to check if the data got flushed > out to disk. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped
[ https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3990: -- Fix Version/s: 0.10.0 > The number of shuffle penalties for a host/inputAttemptIdentifier should be > capped > -- > > Key: TEZ-3990 > URL: https://issues.apache.org/jira/browse/TEZ-3990 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1, 0.10.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, > TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch, TEZ-3990.006.patch > > > In a scenario where the same mapId fetches fail, the penalty code allows > adding the same Host/InputAttemptIdentifier over and over with revised > penalty time that grows exponentially. It should at some point drop the > retrying and report failure to the AM asap to allow the job to rectify the > upstream output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4066) Upgrade servlet-api from 2.5 to 3.1.0
[ https://issues.apache.org/jira/browse/TEZ-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4066: -- Fix Version/s: 0.10.0 > Upgrade servlet-api from 2.5 to 3.1.0 > - > > Key: TEZ-4066 > URL: https://issues.apache.org/jira/browse/TEZ-4066 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4066.001.patch, TEZ-4066.002.patch > > > Oozie launcher jobs trying to launch Tez jobs now fail to render Oozie > Launcher Job AM due to both 2.5 (from tez) and 3.1.0 (from hadoop) > servlet-api both being in the classpath. Tez should sync with servlet api > version from tez master branch that only supports hadoop 3+ > {code} > 2019-04-30 14:53:02,747 WARN [qtp1213419524-119] > org.eclipse.jetty.server.HttpChannel: > java.lang.NoSuchMethodError: > javax.servlet.http.HttpServletRequest.isAsyncStarted()Z > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:688) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > at > org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4174) [Kubernetes] Fetcher should connection failure on SocketException
[ https://issues.apache.org/jira/browse/TEZ-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4174: -- Fix Version/s: 0.10.0 > [Kubernetes] Fetcher should connection failure on SocketException > - > > Key: TEZ-4174 > URL: https://issues.apache.org/jira/browse/TEZ-4174 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4174.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Fetcher considers connection failure only when http.connect throws exception. > In kubernetes environment, where there can be intermediate proxies, > getInputStream from http connection can throw connection reset error (5xx). > These errors should be considered as connection failures as well. > {code:java} > 2020-05-08 17:03:54.080 WARN [Fetcher_B {Map_3} #3] shuffle.Fetcher: Fetch > Failure while connecting from 10.117.155.27 to: 10.117.154.115:25551, > attempt: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, > pathComponent=attempt_1588982534035__1_00_00_0_10030, spillType=0, > spillId=-1] Informing ShuffleManager: > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:210) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:706) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:530) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:563) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:487) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:285) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4068) Prevent new speculative attempt after task has issued canCommit to an attempt
[ https://issues.apache.org/jira/browse/TEZ-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4068: -- Fix Version/s: 0.10.0 > Prevent new speculative attempt after task has issued canCommit to an attempt > - > > Key: TEZ-4068 > URL: https://issues.apache.org/jira/browse/TEZ-4068 > Project: Apache Tez > Issue Type: Improvement >Reporter: Jonathan Turner Eagles >Assignee: Ying Han >Priority: Major > Fix For: 0.10.0, 0.10.1, 0.9.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When a running attempt calls TaskImpl#canCommit through the taskUmbilical, > the TaskImpl will issue a "go" if it is the first attempt to do so. Otherwise > it will issue a "no-go". After commitAttempt is assigned is TaskImpl, no > other attempt is allowed to succeed at that point. So a speculative attempt > that is launched after commitAttempt is assigned can never finished before > the original since is will allows be given a "no-go" in the canCommit > response. In this jira, I propose to discuss disabling speculative attempts > after commitAttempt has been assigned. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4057) Fix Unsorted broadcast shuffle umasks
[ https://issues.apache.org/jira/browse/TEZ-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4057: -- Fix Version/s: 0.10.0 > Fix Unsorted broadcast shuffle umasks > - > > Key: TEZ-4057 > URL: https://issues.apache.org/jira/browse/TEZ-4057 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Gopal Vijayaraghavan >Assignee: Eric Wohlstadter >Priority: Major > Fix For: 0.10.0, 0.10.1, 0.9.3 > > Attachments: TEZ-4057.1.patch > > > {code} > if (numPartitions == 1 && !pipelinedShuffle) { > //special case, where in only one partition is available. > finalOutPath = outputFileHandler.getOutputFileForWrite(); > finalIndexPath = > outputFileHandler.getOutputIndexFileForWrite(indexFileSizeEstimate); > skipBuffers = true; > writer = new IFile.Writer(conf, rfs, finalOutPath, keyClass, valClass, > codec, outputRecordsCounter, outputRecordBytesCounter); > } else { > skipBuffers = false; > writer = null; > } > {code} > The broadcast events don't update the file umasks, because they have 1 > partition. > {code} > total 8.0K > -rw--- 1 hive hadoop 15 Mar 27 20:30 file.out > -rw-r- 1 hive hadoop 32 Mar 27 20:30 file.out.index > {code} > ending up with readable index files and unreadable .out files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4113) TezUtils::createByteStringFromConf should use snappy instead of DeflaterOutputStream
[ https://issues.apache.org/jira/browse/TEZ-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4113: -- Fix Version/s: 0.10.0 > TezUtils::createByteStringFromConf should use snappy instead of > DeflaterOutputStream > > > Key: TEZ-4113 > URL: https://issues.apache.org/jira/browse/TEZ-4113 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Trivial > Fix For: 0.10.0, 0.10.1 > > Attachments: Screenshot 2020-01-10 at 6.32.31 AM.png, TEZ-4113.1.patch > > > Under concurrent workload, where lots of short running DAGs were submitted in > Hive, HS2 spikes up heavily on CPU due to > {{TezUtils::createByteStringFromConf}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4207) Provide approximate number of input records to be processed in UnorderedKVInput
[ https://issues.apache.org/jira/browse/TEZ-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4207: -- Fix Version/s: 0.10.0 > Provide approximate number of input records to be processed in > UnorderedKVInput > --- > > Key: TEZ-4207 > URL: https://issues.apache.org/jira/browse/TEZ-4207 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4207.1.patch, TEZ-4207.wip.patch > > > There are cases when broadcasted data is loaded into hashtable in upstream > applications (e.g Hive). Apps tends to predict the number of entries in the > hashtable diligently, but there are cases where these estimates can be very > complicated at compile time. > > Tez can help in such cases, by providing "approximate number of input records > counter", to be processed in UnorderedKVInput. This is to avoid expensive > rehash when hashtable sizes are not estimated correctly. It would be good to > start with broadcast first and then to move on to unordered partitioned case > later. > > This would help in predicting the number of entries at runtime & can get > better estimates for hashtable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3998) Allow CONCURRENT edge property in DAG construction and introduce ConcurrentSchedulingType
[ https://issues.apache.org/jira/browse/TEZ-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3998: -- Fix Version/s: 0.10.0 > Allow CONCURRENT edge property in DAG construction and introduce > ConcurrentSchedulingType > - > > Key: TEZ-3998 > URL: https://issues.apache.org/jira/browse/TEZ-3998 > Project: Apache Tez > Issue Type: Task >Reporter: Yingda Chen >Assignee: Yingda Chen >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-3998.001.patch.diff > > Time Spent: 10m > Remaining Estimate: 0h > > This is the first task related to TEZ-3997 > > |Note: There is no API change in this proposed change. The majority of this > change will be lifting some existing constraints against CONCURRENT edge > type, and addition of a VertexMangerPlugin implementation.| > > This includes enabling the CONCURRENT SchedulingType as a valid edge > property, by removing all the sanity check against CONCURRENT during DAG > construction/execution. A new VertexManagerPlugin (namely > VertexManagerWithConcurrentInput) will be implemented for vertex with > incoming concurrent edge(s). > In addition, we will assume in this change that > * A vertex *cannot* have both SEQUENTIAL and CONCURRENT incoming edges > * No shuffle or data movement is handled by Tez framework when two vertices > are connected through a CONCURRENT edge. Instead, runtime should be > responsible for handling all the data-plane communications (as proposed in > [1]). > Note that the above assumptions are common for scenarios such as whole-DAG or > sub-graph gang scheduling, but they may be relaxed in later implementation, > which may allow mixture of SEQUENTIAL and CONCURRENT edges on the same vertex. > > Most of the (meaningful) scheduling decisions today in Tez are made based on > the notion of (or an extended version of) source task completion. This will > no longer be true in presence of CONCURRENT edge. Instead, events such as > source vertex configured, or source task running will become more relevant > when making scheduling decision for two vertices connected via a CONCURRENT > edge. We therefore introduce a new enum *ConcurrentSchedulingType* to > describe the “scheduling timing” for the downstream vertex in such scenarios. > |public enum ConcurrentSchedulingType{ > /** * trigger downstream vertex tasks scheduling by "configured" event of > upstream vertices */ > SOURCE_VERTEX_CONFIGURED, > /** * trigger downstream vertex tasks scheduling by "running" event of > upstream tasks */ > SOURCE_TASK_STARTED > }| > > Note that in this change, we will only use SOURCE_VERTEX_CONFIGURED as the > scheduling type, which suffice for scenarios of whole-DAG or sub-graph > gang-scheduling, where we want (all the tasks in) the downstream vertex to be > scheduled together with (all the tasks) in the upstream vertex. In this case, > we can leverage the existing onVertexStateUpdated() interface of > VextexMangerPlugin to collect relevant information to assist the scheduling > decision, and *there is no additional API change necessary*. However, in more > subtle case such as the parameter-server example described in Fig. 1, other > scheduling type would be more relevant, therefore the placeholder for > *ConcurrentSchedulingType* will be introduced in this change as part of the > infrastructure work. > > Finally, since we assume that all communications between two vertices > connected via CONCURRENT edge are handled by application runtime, a > CONCURRENT edge will be assigned a DummyEdgeManager that basically mute all > DME/VME handling. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4037) Add back DAG search status KILLED
[ https://issues.apache.org/jira/browse/TEZ-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4037: -- Fix Version/s: 0.10.0 > Add back DAG search status KILLED > -- > > Key: TEZ-4037 > URL: https://issues.apache.org/jira/browse/TEZ-4037 > Project: Apache Tez > Issue Type: Task > Components: UI >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4037.001.patch > > > https://issues.apache.org/jira/browse/TEZ-2447 removed KILLED since sometimes > this status can fail to search all KILLED DAGs. This jira re-adds KILLED dag > status search since it still has value and would rather focus on fixing the > DAGs who fail to write killed status to history log file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4041) TestExtServicesWithLocalMode fails in docker
[ https://issues.apache.org/jira/browse/TEZ-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4041: -- Fix Version/s: 0.10.0 > TestExtServicesWithLocalMode fails in docker > > > Key: TEZ-4041 > URL: https://issues.apache.org/jira/browse/TEZ-4041 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4041.001.patch > > > {code} > 2019-02-13 00:24:33,703 INFO [DAGAppMaster Thread] service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.tez.dag.app.DAGAppMaster failed in state INITED > org.apache.tez.dag.api.TezUncheckedException: > java.lang.reflect.InvocationTargetException > at > org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215) > at > org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184) > at > org.apache.tez.dag.app.TaskCommunicatorManager.(TaskCommunicatorManager.java:152) > at > org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1088) > at > org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:532) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2606) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) > at > org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2603) > at org.apache.tez.client.LocalClient$1.run(LocalClient.java:327) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213) > ... 12 more > Caused by: java.lang.NullPointerException > at > org.apache.tez.test.service.rpc.TezTestServiceProtocolProtos$SubmitWorkRequestProto$Builder.setUser(TezTestServiceProtocolProtos.java:5549) > at > org.apache.tez.dag.app.taskcomm.TezTestServiceTaskCommunicatorImpl.(TezTestServiceTaskCommunicatorImpl.java:65) > ... 17 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3989) Fix by-laws related to emeritus clause
[ https://issues.apache.org/jira/browse/TEZ-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3989: -- Fix Version/s: 0.10.0 > Fix by-laws related to emeritus clause > --- > > Key: TEZ-3989 > URL: https://issues.apache.org/jira/browse/TEZ-3989 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Major > Fix For: 0.10.0, 0.10.1 > > > The emeritus clause is not valid and needs to be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress
[ https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3982: -- Fix Version/s: 0.10.0 > DAGAppMaster and tasks should not report negative or invalid progress > - > > Key: TEZ-3982 > URL: https://issues.apache.org/jira/browse/TEZ-3982 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1, 0.10.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, > TEZ-3982.003.patch, TEZ-3982.004.patch, TEZ-3982.005.branch-0.9.patch > > > AM fails (AMRMClient expects non negative progress) if any component reports > invalid or -ve progress, DagAppMaster/Tasks should check and report > accordingly to allow the AM to execute. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4032) TEZ will throw "Client cannot authenticate via:[TOKEN, KERBEROS]" when used with HDFS federation(non viewfs, only hdfs schema used).
[ https://issues.apache.org/jira/browse/TEZ-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4032: -- Fix Version/s: 0.10.0 > TEZ will throw "Client cannot authenticate via:[TOKEN, KERBEROS]" when used > with HDFS federation(non viewfs, only hdfs schema used). > -- > > Key: TEZ-4032 > URL: https://issues.apache.org/jira/browse/TEZ-4032 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4032.001.patch, TEZ-4032.002.patch, > TEZ-4032.003.patch, TEZ-4032.004.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > I execute hive tez job in HDFS federation and kerberos. The hadoop cluster > has multiple namespace (hdfs://ns1,hdfs://ns2,hdfs://ns3 ...)and we don't > use viewfs schema. Hive tez job will throw error as follows when the table > is created in hdfs://ns2 (default configuration fs.defaluFS=hdfs://ns1): > {code:java} > 2019-01-21 15:43:46,507 [WARN] [TezChild] |ipc.Client|: Exception encountered > while connecting to the server : > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS] > 2019-01-21 15:43:46,507 [INFO] [TezChild] |retry.RetryInvocationHandler|: > java.io.IOException: DestHost:destPort docker5.cmss.com:8020 , > LocalHost:localPort docker1.cmss.com/10.254.10.116:0. Failed on local > exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS], while invoking > ClientNamenodeProtocolTranslatorPB.getFileInfo over > docker5.cmss.com/10.254.2.106:8020 after 14 failover attempts. Trying to > failover after sleeping for 10827ms. > 2019-01-21 15:43:57,338 [WARN] [TezChild] |ipc.Client|: Exception encountered > while connecting to the server : > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS] > 2019-01-21 15:43:57,363 [ERROR] [TezChild] |tez.MapRecordSource|: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing writable (null) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:568) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > DestHost:destPort docker4.cmss.com:8020 , LocalHost:localPort > docker1.cmss.com/10.254.10.116:0. Failed on local exception: > java.io.IOException: org.apache.hadoop.security.AccessControlException: > Client cannot authenticate via:[TOKEN, KERBEROS] > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator
[jira] [Updated] (TEZ-4208) Pipelinesorter uses single SortSpan after spill
[ https://issues.apache.org/jira/browse/TEZ-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4208: -- Fix Version/s: 0.10.0 > Pipelinesorter uses single SortSpan after spill > --- > > Key: TEZ-4208 > URL: https://issues.apache.org/jira/browse/TEZ-4208 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4208.1.patch, TEZ-4208.2.patch, q67_sorter.log > > > Though it could have created multiple spans, tez always uses the first span > after spill. It is quite possible that other spans are bigger compared to the > first one, due to progressive space allocation. Fixing this would help in > reducing the number of spills (depending on the jobs) and lesser load for > indexcache entries (as lesser number of files have to be opened). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4036) TestMockDAGAppMaster#testInternalPreemption should assert for failed state
[ https://issues.apache.org/jira/browse/TEZ-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4036: -- Fix Version/s: 0.10.0 > TestMockDAGAppMaster#testInternalPreemption should assert for failed state > -- > > Key: TEZ-4036 > URL: https://issues.apache.org/jira/browse/TEZ-4036 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4036.001.patch > > > Due to root cause mentioned in TEZ-3950, the test fails regularly. Until the > fix for that JIRA is in (which is rather a good amount of redesign) , adding > failed assert to the test as this is now an expected state for the task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4156) Fix Tez to reuse IPC connections
[ https://issues.apache.org/jira/browse/TEZ-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4156: -- Fix Version/s: 0.10.0 > Fix Tez to reuse IPC connections > > > Key: TEZ-4156 > URL: https://issues.apache.org/jira/browse/TEZ-4156 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4156.1.patch, TEZ-4156.2.patch, TEZ-4156.3.patch, > TEZ-4156.4.patch > > > When tracking DAG progress, TezClientUtils ends up creating new remote user. > Because of this new UGI creation, IPC connections are not reused internally. > https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java#L965 > More info from Hadoop side: > In hadoop's IPC layer, connectionIds are checked based on > UserGroupInformation. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1600 > However, UserGroupInformation comparison is based on == > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1789 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4172) Let tasks be killed after too many overall attempts
[ https://issues.apache.org/jira/browse/TEZ-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4172: -- Fix Version/s: 0.10.0 > Let tasks be killed after too many overall attempts > --- > > Key: TEZ-4172 > URL: https://issues.apache.org/jira/browse/TEZ-4172 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Fix For: 0.10.0, 0.10.1, 0.9.3 > > Attachments: TEZ-4172.01.patch, TEZ-4172.02.patch > > > Currently, TaskImpl doesn't consider failing a task if there are too many > overall attempts. In case of LLAP, the number of preempted task attempts -> > overall task attempts [can grow in a > linkedhashmap|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L127]. > In an edge case, where an upstream application (Hive LLAP) cannot cope with a > problematic query, this can also lead to OOM in the AM, due the very high > number of TaskAttemptImpl objects. > It would be beneficial to have the chance to limit the overall number of task > attempts, regardless of they have been failed or killed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3976) Batch ShuffleManager error report events
[ https://issues.apache.org/jira/browse/TEZ-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3976: -- Fix Version/s: 0.10.0 > Batch ShuffleManager error report events > > > Key: TEZ-3976 > URL: https://issues.apache.org/jira/browse/TEZ-3976 > Project: Apache Tez > Issue Type: Bug >Reporter: Jaume M >Assignee: Jaume M >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-3976.1.patch, TEZ-3976.2.patch, TEZ-3976.3.patch, > TEZ-3976.4.patch, TEZ-3976.5.patch, TEZ-3976.6.patch, TEZ-3976.7.patch, > TEZ-3976.8.patch, TEZ-3976.9.patch > > > The symptoms are a lot of these logs are being shown: > {code:java} > 2018-06-15T18:09:35,811 INFO [Fetcher_B {Reducer_5} #0 ()] > org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: > Fetch failed for src: InputAttemptIdentifier [inputIdentifier=701, > attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, > spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=701, > attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, > spillId=0], connectFailed: true > 2018-06-15T18:09:35,811 WARN [Fetcher_B {Reducer_5} #1 ()] > org.apache.tez.runtime.library.common.shuffle.Fetcher: copyInputs failed for > tasks [InputAttemptIdentifier [inputIdentifier=589, attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0]] > 2018-06-15T18:09:35,811 INFO [Fetcher_B {Reducer_5} #1 ()] > org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: > Fetch failed for src: InputAttemptIdentifier [inputIdentifier=589, > attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=589, > attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0], connectFailed: true > {code} > Each of those translate into an event in the AM which finally crashes due to > OOM after around 30 minutes and around 10 million shuffle input errors (and > 10 million lines like the previous ones). When the ShufflerManager is closed > and the counters reported there are many shuffle input errors, some of those > logs are: > {code:java} > 2018-06-15T17:46:30,988 INFO [TezTR-441963_21_34_4_0_4 > (152901963_0021_34_04_00_4)] runtime.LogicalIOProcessorRuntimeTask: > Final Counters for attempt_152901963_0021_34_04_00_4: Counters: 43 > [[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=0, > NUM_SHUFFLED_INPUTS=26, NUM_FAILED_SHUFFLE_INPUTS=858965, > INPUT_RECORDS_PROCESSED=26, OUTPUT_RECORDS=1, OUTPUT_LARGE_RECORDS=0, > OUTPUT_BYTES=779472, OUTPUT_BYTES_WITH_OVERHEAD=779483, > OUTPUT_BYTES_PHYSICAL=780146, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, > ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, > SHUFFLE_BYTES=4207563, SHUFFLE_BYTES_DECOMPRESSED=20266603, > SHUFFLE_BYTES_TO_MEM=3380616, SHUFFLE_BYTES_TO_DISK=0, > SHUFFLE_BYTES_DISK_DIRECT=826947, SHUFFLE_PHASE_TIME=52516, > FIRST_EVENT_RECEIVED=1, LAST_EVENT_RECEIVED=1185][HIVE > RECORDS_OUT_INTERMEDIATE_^[[1;35;40m^[[KReducer_12^[[m^[[K=1, > RECORDS_OUT_OPERATOR_GBY_159=1, > RECORDS_OUT_OPERATOR_RS_160=1][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_INPUT_Map_11 > FIRST_EVENT_RECEIVED=1, INPUT_RECORDS_PROCESSED=26, > LAST_EVENT_RECEIVED=1185, NUM_FAILED_SHUFFLE_INPUTS=858965, > NUM_SHUFFLED_INPUTS=26, SHUFFLE_BYTES=4207563, > SHUFFLE_BYTES_DECOMPRESSED=20266603, SHUFFLE_BYTES_DISK_DIRECT=826947, > SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=3380616, > SHUFFLE_PHASE_TIME=52516][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_OUTPUT_Map_1 > ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, > ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779472, OUTPUT_BYTES_PHYSICAL=780146, > OUTPUT_BYTES_WITH_OVERHEAD=779483, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, > SPILLED_RECORDS=0]] > 2018-06-15T17:46:32,271 INFO [TezTR-441963_21_34_3_15_1 ()] > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Final Counters for > attempt_152901963_0021_34_03_15_1: Counters: 87 [[File System > Counters FILE_BYTES_READ=0, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0, > FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=2344929, > HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=5, HDFS_LARGE_READ_OPS=0, > HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter > SPILLED_RECORDS=0, NUM_SHUFFLED_INPUTS=1, NUM_FAILED_SHUFFLE_INPUTS=105195, > INPUT_RECORDS_PROCESSED=397, INPUT_SPLIT_LENGTH_BYTES=21563271, > OUTPUT_RECORDS=15737, OUTPUT_LARGE_RECORDS=0, OUTPUT_BYTES=1235818, > OUTPUT_BYTES_WITH_OVERHEAD=1267307, OUTPUT_
[jira] [Updated] (TEZ-4213) Bound appContext executor capacity using a configurable property
[ https://issues.apache.org/jira/browse/TEZ-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4213: -- Fix Version/s: 0.10.0 > Bound appContext executor capacity using a configurable property > > > Key: TEZ-4213 > URL: https://issues.apache.org/jira/browse/TEZ-4213 > Project: Apache Tez > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: 0.10_blocker > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4213.01.patch, TEZ-4213.02.patch, TEZ-4213.03.patch, > TEZ-4213.04.patch, TEZ-4213.05.patch, TEZ-4213.06.patch, TEZ-4213.07.patch, > TEZ-4213.08.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > After TEZ-4170 was merged, appContext executor pool is also used by the > RootInputInitializerManager to speed up SplitGeneration. > However, this executor pool currently has not capacity limit > https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L624 > The problem the occurs when generating splits for larger inputs (thousands or > more) is that it can could result to > {color:red}java.lang.OutOfMemoryError{color} > that is also reproducible with a test case. > https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/RootInputInitializerManager.java#L130 > To avoid such errors, I propose to limit the capacity of this pool to a > configurable value that can be for example the number of physical cores by > default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4062) Speculative attempt scheduling should be aborted when Task has completed
[ https://issues.apache.org/jira/browse/TEZ-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4062: -- Fix Version/s: 0.10.0 > Speculative attempt scheduling should be aborted when Task has completed > > > Key: TEZ-4062 > URL: https://issues.apache.org/jira/browse/TEZ-4062 > Project: Apache Tez > Issue Type: Bug >Reporter: Yingda Chen >Assignee: Ying Han >Priority: Major > Fix For: 0.10.0, 0.10.1, 0.9.3 > > Attachments: TEZ-4062.001.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In RedundantScheduleTransition (inside TaskImpl), we try to find the oldest > running attempt and use it as the causual attempt when doing > "addAndScheduleAttempt". > > However, the task may have completed at this moment, i.e., the task attempt > that was considered running and long-tailed by speculator is now completed. > In this case, we may not be able to find any unfinished attempt, which will > lead to NPE in following logic (even without NPE, it still makes no sense to > proceed with scheduling speculative attempt anyway) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4075) Tez: Reimplement tez.runtime.transfer.data-via-events.enabled
[ https://issues.apache.org/jira/browse/TEZ-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4075: -- Fix Version/s: 0.10.0 > Tez: Reimplement tez.runtime.transfer.data-via-events.enabled > - > > Key: TEZ-4075 > URL: https://issues.apache.org/jira/browse/TEZ-4075 > Project: Apache Tez > Issue Type: Improvement >Reporter: Gopal Vijayaraghavan >Assignee: Richard Zhang >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4075.10.patch, TEZ-4075.15.patch, TEZ-4075.16.patch, > TEZ-4075.enable-dme.16.patch, Tez-4075.5.patch, Tez-4075.8.patch > > Time Spent: 50m > Remaining Estimate: 0h > > This was factored out by TEZ-2196, which does skip buffers for 1-partition > data exchanges (therefore goes to disk directly). > {code} > if (shufflePayload.hasData()) { > shuffleManager.addKnownInput(shufflePayload.getHost(), > DataProto dataProto = shufflePayload.getData(); > shufflePayload.getPort(), srcAttemptIdentifier, srcIndex); > FetchedInput fetchedInput = > inputAllocator.allocate(dataProto.getRawLength(), > dataProto.getCompressedLength(), srcAttemptIdentifier); > moveDataToFetchedInput(dataProto, fetchedInput, hostIdentifier); > shuffleManager.addCompletedInputWithData(srcAttemptIdentifier, > fetchedInput); > } else { > shuffleManager.addKnownInput(shufflePayload.getHost(), > shufflePayload.getPort(), srcAttemptIdentifier, srcIndex); > } > {code} > got removed in > https://github.com/apache/tez/commit/1ba1f927c16a1d7c273b6cd1a8553e5269d1541a > It would be better to buffer up the 512Byte limit for the event size before > writing to disk, since creating a new file always incurs disk traffic, even > if the file is eventually being served out of the buffer cache. > The total overhead of receiving an event, then firing an HTTP call to fetch > the data etc adds approx 100-150ms to a query - the data xfer through the > event will skip the disk entirely for this & also remove the extra IOPS > incurred. > This channel is not suitable for large-scale event transport, but > specifically the workload here deals with 1-row control tables which consume > more bandwidth with HTTP headers and hostnames than the 93 byte payload. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-1348) Allow Tez local mode to run against filesystems other than local FS
[ https://issues.apache.org/jira/browse/TEZ-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-1348: -- Fix Version/s: 0.10.0 > Allow Tez local mode to run against filesystems other than local FS > --- > > Key: TEZ-1348 > URL: https://issues.apache.org/jira/browse/TEZ-1348 > Project: Apache Tez > Issue Type: Sub-task > Environment: Committed to branch-0.9. >Reporter: Siddharth Seth >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: tez-1348.patch, tez-1348.patch, tez-1348.txt > > Time Spent: 20m > Remaining Estimate: 0h > > In TEZ-717, I incorrect thought setting fs.defaultFS programmatically in > tez-site would work for local mode. > Currently the requirement is that tez-site.xml must have fs.defaultFS set to > file:///. > While that works, it doesn't allow for seamless execution in either > local-mode or on a cluster. > The main issue here is that when Inputs / Outputs are configured - they use a > version of configuration which reads tez-site, and do not use the > configuration from the client itself (which is correct behaviour). > Not sure what a good way to fix this is > 1) It may be possible to override this value each time an instance of > Configuration/TezConfiguration is created. One possible way would be to > statically add a default resource to Configuration the moment a local client > is created. > 2) Provide information in the contexts on whether this is local or not. This > is fairly ugly, and would get in the way of running mixed mode tasks. > Anyone have other suggestions ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3988) Update snapshot version in master to 0.10.1-SNAPSHOT
[ https://issues.apache.org/jira/browse/TEZ-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3988: -- Fix Version/s: 0.10.0 > Update snapshot version in master to 0.10.1-SNAPSHOT > > > Key: TEZ-3988 > URL: https://issues.apache.org/jira/browse/TEZ-3988 > Project: Apache Tez > Issue Type: Task >Reporter: Eric Wohlstadter >Assignee: Eric Wohlstadter >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-3988.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4086) Some Tez examples cannot work with outputPaths on a FS other than the default FS
[ https://issues.apache.org/jira/browse/TEZ-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4086: -- Fix Version/s: 0.10.0 > Some Tez examples cannot work with outputPaths on a FS other than the default > FS > > > Key: TEZ-4086 > URL: https://issues.apache.org/jira/browse/TEZ-4086 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4086.01.txt, TEZ-4086.02.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > There's several examples which make use of the FileSystem based on the > default config. > This results in failure if the outputPath is on a different FileSystem. (e.g. > fs.defaultFS set to HDFS and outputPath for the example set to s3) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3972) Tez DAG can hang when a single task fails to fetch
[ https://issues.apache.org/jira/browse/TEZ-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3972: -- Fix Version/s: 0.10.0 > Tez DAG can hang when a single task fails to fetch > -- > > Key: TEZ-3972 > URL: https://issues.apache.org/jira/browse/TEZ-3972 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-3972.001.patch, TEZ-3972.002.patch, > TEZ-3972.003.patch > > > Description of the hung DAG: > A DAG with 2 vertices. {{Map}} Vertex has 22k maps, downstream vertex > {{Reduce}} has 1009 tasks. All tasks succeed but one, which hangs. This one > task (attempt) is doing a local fetch from a node that (now) has a bad disk. > It fails to fetch and reports to the AM for the offending input attempt > identifiers. However the AM does not schedule a re-run as > {{uniquefailedOutputReports}} size is 1 (since only this task attempt failed > to fetch) and failure fraction is not met. The denominator for this fraction > is the total number of tasks. That causes the re-run to never occur. This > JIRA tracks the AM side of the change to alleviate this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4204) Data race in RootInputInitializerManager
[ https://issues.apache.org/jira/browse/TEZ-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4204: -- Fix Version/s: 0.10.0 > Data race in RootInputInitializerManager > > > Key: TEZ-4204 > URL: https://issues.apache.org/jira/browse/TEZ-4204 > Project: Apache Tez > Issue Type: Bug >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Blocker > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4204.1.patch, TEZ-4204.1.patch, TEZ-4204.2.patch > > > After https://issues.apache.org/jira/browse/TEZ-4170 there is a data race for > initializerMap in RootInputInitializerManager. initializerMap should be > initialized before vertex state is set to initializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4044) Zookeeper: exclude jline from Zookeeper client from tez dist
[ https://issues.apache.org/jira/browse/TEZ-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4044: -- Fix Version/s: 0.10.0 > Zookeeper: exclude jline from Zookeeper client from tez dist > > > Key: TEZ-4044 > URL: https://issues.apache.org/jira/browse/TEZ-4044 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Gopal Vijayaraghavan >Assignee: Gopal Vijayaraghavan >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4044.1.patch > > > {code} > [INFO] | +- org.apache.zookeeper:zookeeper:jar:3.4.9:compile > [INFO] | | \- jline:jline:jar:0.9.94:compile > {code} > Breaks CLI clients further down the dependency tree. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4021) API incompatibility wro4j-maven-plugin
[ https://issues.apache.org/jira/browse/TEZ-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4021: -- Fix Version/s: 0.10.0 > API incompatibility wro4j-maven-plugin > -- > > Key: TEZ-4021 > URL: https://issues.apache.org/jira/browse/TEZ-4021 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4021.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4102) Let session credentials be merged before merging am launch credentials
[ https://issues.apache.org/jira/browse/TEZ-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4102: -- Fix Version/s: 0.10.0 > Let session credentials be merged before merging am launch credentials > -- > > Key: TEZ-4102 > URL: https://issues.apache.org/jira/browse/TEZ-4102 > Project: Apache Tez > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4102.01.patch, TEZ-4102.02.patch, TEZ-4102.03.patch, > TEZ-4102.04.patch, TEZ-4102.05.patch, TEZ-4102.06.patch > > > Given the following scenario: kerberos + long running session + dags keep > submitted to the session > After 24h the queries can fail, because tasks don't have the correct > HDFS_DELEGATION_TOKEN, because there is chance that am credentials has been > previously filled with tokens and it cannot be overridden by session > credentials > [here|https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java#L485] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4098) tez-tools improvements: log-split, swimlane
[ https://issues.apache.org/jira/browse/TEZ-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4098: -- Fix Version/s: 0.10.0 > tez-tools improvements: log-split, swimlane > --- > > Key: TEZ-4098 > URL: https://issues.apache.org/jira/browse/TEZ-4098 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4098.01.patch, TEZ-4098.02.patch, TEZ-4098.03.patch, > TEZ-4098.03.patch, TEZ-4098.03.patch, TEZ-4098.03.patch, TEZ-4098.04.patch, > TEZ-4098.05.patch > > > While using tez-tools for analyzing application logs, I'm about to improve > them a little bit. Details will be added here to the description. > 1. Support swimlane.sh to consume local file > 2. Create a log splitter, which is able to split the aggregated log file into > separate container directories, like below: > {code} > ├── container_e02_1572948601374_0004_01_01 > │ ├── container-localizer-syslog > │ ├── dag_1572948601374_0004_1.dot > │ ├── prelaunch.err > │ ├── prelaunch.out > │ ├── stderr > │ ├── stdout > │ ├── syslog > │ ├── syslog_dag_1572948601374_0004_1 > │ └── syslog_dag_1572948601374_0004_1_post > ├── container_e02_1572948601374_0004_01_02 > │ ├── prelaunch.err > │ ├── prelaunch.out > │ ├── stderr > │ ├── stdout > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4179) [Kubernetes] Extend NodeId in tez to support unique worker identity
[ https://issues.apache.org/jira/browse/TEZ-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4179: -- Fix Version/s: 0.10.0 > [Kubernetes] Extend NodeId in tez to support unique worker identity > --- > > Key: TEZ-4179 > URL: https://issues.apache.org/jira/browse/TEZ-4179 > Project: Apache Tez > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Attila Magyar >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4179.1.patch, TEZ-4179.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In kubernetes environment where pods can have same host name and port, there > can be situations where node trackers could be retaining old instance of the > pod in its cache. In case of Hive LLAP, where the llap tez task scheduler > maintains the membership of nodes based on zookeeper registry events there > can be cases where NODE_ADDED followed by NODE_REMOVED event could end up > removing the node/host from node trackers because of stable hostname and > service port. The NODE_REMOVED event in this case is old stale event of the > already dead pod but ZK will send only after session timeout (in case of > non-graceful shutdown). If this sequence of events happen, a node/host is > completely lost form the schedulers perspective. > To support this scenario, tez can extend yarn's NodeId to include > uniqueIdentifier. Llap task scheduler can construct the container object with > this new NodeId that includes uniqueIdentifier as well so that stale events > like above will only remove the host/node that matches the old > uniqueIdentifier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4035) Tez master breaks with YARN 3.2.0 ApplicationReport API change
[ https://issues.apache.org/jira/browse/TEZ-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4035: -- Fix Version/s: 0.10.0 > Tez master breaks with YARN 3.2.0 ApplicationReport API change > -- > > Key: TEZ-4035 > URL: https://issues.apache.org/jira/browse/TEZ-4035 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Jonathan Turner Eagles >Priority: Minor > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4035.001.patch > > > {noformat} > tez/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/NotRunningJob.java:[89,29] > no suitable method found for > newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,int,int,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,,java.lang.String,float,java.lang.String,) > [ERROR] method > org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token) > is not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token,java.util.Set,boolean,org.apache.hadoop.yarn.api.records.Priority,java.lang.String,java.lang.String) > is not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token,java.util.Set,boolean,org.apache.hadoop.yarn.api.records.Priority,java.lang.String,java.lang.String) > is not applicable{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4049) Fix findbugs issues in NotRunningJob
[ https://issues.apache.org/jira/browse/TEZ-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4049: -- Fix Version/s: 0.10.0 > Fix findbugs issues in NotRunningJob > > > Key: TEZ-4049 > URL: https://issues.apache.org/jira/browse/TEZ-4049 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4049.001.patch > > > Introduced by TEZ-4035. Remove fixes while keeping 3.2.0 api compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4050) maven site is failing due to missing configuration.
[ https://issues.apache.org/jira/browse/TEZ-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4050: -- Fix Version/s: 0.10.0 > maven site is failing due to missing configuration. > --- > > Key: TEZ-4050 > URL: https://issues.apache.org/jira/browse/TEZ-4050 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4050.001.patch > > > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-site-plugin:3.4:stage (default-cli) on project > tez-docs: Missing site information in the distribution management of the > project Tez (org.apache.tez:tez-docs:0.10.1-SNAPSHOT) -> [Help 1] > {code} > From maven site plugin usage we can see we are missing configuration. > https://maven.apache.org/plugins/maven-site-plugin/usage.html > {code} > > ... > > > www.yourcompany.com > scp://www.yourcompany.com/www/docs/project/ > > > ... > > {code} > Tez does not use this url to deploy and neither does hadoop. But it is needed > to stage site documentation. url is only used during site:deploy which is > never called during Tez QA step. > This jira aims to provide a place holder (the same as hadoop) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4048) Make proto history logger queue size configurable
[ https://issues.apache.org/jira/browse/TEZ-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4048: -- Fix Version/s: 0.10.0 > Make proto history logger queue size configurable > - > > Key: TEZ-4048 > URL: https://issues.apache.org/jira/browse/TEZ-4048 > Project: Apache Tez > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4048.1.patch > > > Currently, the queue size is hard-coded to 10K which may be small for some > bigger cluster. Make it configurable and bump up the default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4012) Add docker support for Tez.
[ https://issues.apache.org/jira/browse/TEZ-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4012: -- Fix Version/s: 0.10.0 > Add docker support for Tez. > --- > > Key: TEZ-4012 > URL: https://issues.apache.org/jira/browse/TEZ-4012 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4012.001.patch, TEZ-4012.002.patch, > TEZ-4012.003.patch > > > Hadoop label builds contain a mix of development tools and versions. In > particular H11-H20 are unusable by tez since protoc -version is 2.6.x and > hadoop only supports 2.5.0. This jira will allow builds across all H1-H20 > jenkins machines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4206) TestSpeculation.testBasicSpeculationPerVertexConf is flaky
[ https://issues.apache.org/jira/browse/TEZ-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4206: -- Fix Version/s: 0.10.0 > TestSpeculation.testBasicSpeculationPerVertexConf is flaky > -- > > Key: TEZ-4206 > URL: https://issues.apache.org/jira/browse/TEZ-4206 > Project: Apache Tez > Issue Type: Bug >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > Fix For: 0.10.0, 0.10.1, 0.9.3 > > Attachments: TEZ-4206.1.patch > > > Test is flaky due to timing issue in MockDAGAppMaster's clock and > LegacySpeculator > [https://builds.apache.org/job/PreCommit-TEZ-Build/491/] > [https://builds.apache.org/job/PreCommit-TEZ-Build/492/] > [https://builds.apache.org/job/PreCommit-TEZ-Build/493/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4034) Column selector filter should be case-insensitive
[ https://issues.apache.org/jira/browse/TEZ-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4034: -- Fix Version/s: 0.10.0 > Column selector filter should be case-insensitive > - > > Key: TEZ-4034 > URL: https://issues.apache.org/jira/browse/TEZ-4034 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jacob Tolar >Assignee: Jacob Tolar >Priority: Minor > Fix For: 0.9.2, 0.10.0, 0.10.1 > > Attachments: TEZ-4034.1.patch, image-2019-02-01-09-33-24-480.png > > > In this dialog box: > > !image-2019-02-01-09-33-24-480.png! > > The filter is case-sensitive. So if I type lower-case 'd', I see 'Id' but not > 'Dag Name'. It would be nice if the search ignored case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4088) Create in-memory ifile writer for transferring smaller payloads (follow up of TEZ-4075)
[ https://issues.apache.org/jira/browse/TEZ-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4088: -- Fix Version/s: 0.10.0 > Create in-memory ifile writer for transferring smaller payloads (follow up of > TEZ-4075) > --- > > Key: TEZ-4088 > URL: https://issues.apache.org/jira/browse/TEZ-4088 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Fix For: 0.10.0, 0.10.1 > > Attachments: TEZ-4088.1.patch, TEZ-4088.2.patch, TEZ-4088.3.patch, > TEZ-4088.5.patch, TEZ-4088.6.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > TEZ-4075 enabled data transfer over DME for smaller payloads. This helps in > reducing shuffle. > However, it still incurs disk IO cost (+flush) in producer side. It would be > good to retain smaller payloads in mem, so that disk IO costs can be saved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4048) Make proto history logger queue size configurable
[ https://issues.apache.org/jira/browse/TEZ-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4048: -- Fix Version/s: (was: 0.10.1) > Make proto history logger queue size configurable > - > > Key: TEZ-4048 > URL: https://issues.apache.org/jira/browse/TEZ-4048 > Project: Apache Tez > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4048.1.patch > > > Currently, the queue size is hard-coded to 10K which may be small for some > bigger cluster. Make it configurable and bump up the default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress
[ https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3982: -- Fix Version/s: (was: 0.10.1) > DAGAppMaster and tasks should not report negative or invalid progress > - > > Key: TEZ-3982 > URL: https://issues.apache.org/jira/browse/TEZ-3982 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1, 0.10.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, > TEZ-3982.003.patch, TEZ-3982.004.patch, TEZ-3982.005.branch-0.9.patch > > > AM fails (AMRMClient expects non negative progress) if any component reports > invalid or -ve progress, DagAppMaster/Tasks should check and report > accordingly to allow the AM to execute. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4044) Zookeeper: exclude jline from Zookeeper client from tez dist
[ https://issues.apache.org/jira/browse/TEZ-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4044: -- Fix Version/s: (was: 0.10.1) > Zookeeper: exclude jline from Zookeeper client from tez dist > > > Key: TEZ-4044 > URL: https://issues.apache.org/jira/browse/TEZ-4044 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Gopal Vijayaraghavan >Assignee: Gopal Vijayaraghavan >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4044.1.patch > > > {code} > [INFO] | +- org.apache.zookeeper:zookeeper:jar:3.4.9:compile > [INFO] | | \- jline:jline:jar:0.9.94:compile > {code} > Breaks CLI clients further down the dependency tree. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4037) Add back DAG search status KILLED
[ https://issues.apache.org/jira/browse/TEZ-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4037: -- Fix Version/s: (was: 0.10.1) > Add back DAG search status KILLED > -- > > Key: TEZ-4037 > URL: https://issues.apache.org/jira/browse/TEZ-4037 > Project: Apache Tez > Issue Type: Task > Components: UI >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4037.001.patch > > > https://issues.apache.org/jira/browse/TEZ-2447 removed KILLED since sometimes > this status can fail to search all KILLED DAGs. This jira re-adds KILLED dag > status search since it still has value and would rather focus on fixing the > DAGs who fail to write killed status to history log file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4174) [Kubernetes] Fetcher should connection failure on SocketException
[ https://issues.apache.org/jira/browse/TEZ-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4174: -- Fix Version/s: (was: 0.10.1) > [Kubernetes] Fetcher should connection failure on SocketException > - > > Key: TEZ-4174 > URL: https://issues.apache.org/jira/browse/TEZ-4174 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-4174.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Fetcher considers connection failure only when http.connect throws exception. > In kubernetes environment, where there can be intermediate proxies, > getInputStream from http connection can throw connection reset error (5xx). > These errors should be considered as connection failures as well. > {code:java} > 2020-05-08 17:03:54.080 WARN [Fetcher_B {Map_3} #3] shuffle.Fetcher: Fetch > Failure while connecting from 10.117.155.27 to: 10.117.154.115:25551, > attempt: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, > pathComponent=attempt_1588982534035__1_00_00_0_10030, spillType=0, > spillId=-1] Informing ShuffleManager: > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:210) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:706) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:530) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:563) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:487) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:285) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4042) Speculative attempts should avoid running on the same node
[ https://issues.apache.org/jira/browse/TEZ-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4042: -- Fix Version/s: (was: 0.10.1) > Speculative attempts should avoid running on the same node > -- > > Key: TEZ-4042 > URL: https://issues.apache.org/jira/browse/TEZ-4042 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Ying Han >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4042.001.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4035) Tez master breaks with YARN 3.2.0 ApplicationReport API change
[ https://issues.apache.org/jira/browse/TEZ-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4035: -- Fix Version/s: (was: 0.10.1) > Tez master breaks with YARN 3.2.0 ApplicationReport API change > -- > > Key: TEZ-4035 > URL: https://issues.apache.org/jira/browse/TEZ-4035 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Jonathan Turner Eagles >Priority: Minor > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4035.001.patch > > > {noformat} > tez/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/NotRunningJob.java:[89,29] > no suitable method found for > newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,int,int,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,,java.lang.String,float,java.lang.String,) > [ERROR] method > org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token) > is not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token,java.util.Set,boolean,org.apache.hadoop.yarn.api.records.Priority,java.lang.String,java.lang.String) > is not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.hadoop.yarn.api.records.ApplicationReport.newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,org.apache.hadoop.yarn.api.records.Token,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,long,long,long,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport,java.lang.String,float,java.lang.String,org.apache.hadoop.yarn.api.records.Token,java.util.Set,boolean,org.apache.hadoop.yarn.api.records.Priority,java.lang.String,java.lang.String) > is not applicable{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4021) API incompatibility wro4j-maven-plugin
[ https://issues.apache.org/jira/browse/TEZ-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4021: -- Fix Version/s: (was: 0.10.1) > API incompatibility wro4j-maven-plugin > -- > > Key: TEZ-4021 > URL: https://issues.apache.org/jira/browse/TEZ-4021 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-4021.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3952) Allow Tez task speculation to grant greater customization of certain parameters
[ https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3952: -- Fix Version/s: (was: 0.10.1) > Allow Tez task speculation to grant greater customization of certain > parameters > --- > > Key: TEZ-3952 > URL: https://issues.apache.org/jira/browse/TEZ-3952 > Project: Apache Tez > Issue Type: Improvement >Reporter: Nishant Dash >Assignee: Nishant Dash >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-3952.001.patch, TEZ-3952.002.patch, > TEZ-3952.003.patch, TEZ-3952.004.patch, TEZ-3952.005.patch, TEZ-3952.006.patch > > > Many of the settings for Tez task speculation are hardcoded and should > instead be configurable. For example, there's no equivalent config settings > for the following MapReduce settings: > - mapreduce.job.speculative.speculative-cap-running-tasks > - mapreduce.job.speculative.retry-after-no-speculate > - mapreduce.job.speculative.retry-after-speculate > - mapreduce.job.speculative.minimum-allowed-tasks > - mapreduce.job.speculative.speculative-cap-total-tasks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4113) TezUtils::createByteStringFromConf should use snappy instead of DeflaterOutputStream
[ https://issues.apache.org/jira/browse/TEZ-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4113: -- Fix Version/s: (was: 0.10.1) > TezUtils::createByteStringFromConf should use snappy instead of > DeflaterOutputStream > > > Key: TEZ-4113 > URL: https://issues.apache.org/jira/browse/TEZ-4113 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Trivial > Fix For: 0.10.0 > > Attachments: Screenshot 2020-01-10 at 6.32.31 AM.png, TEZ-4113.1.patch > > > Under concurrent workload, where lots of short running DAGs were submitted in > Hive, HS2 spikes up heavily on CPU due to > {{TezUtils::createByteStringFromConf}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3989) Fix by-laws related to emeritus clause
[ https://issues.apache.org/jira/browse/TEZ-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3989: -- Fix Version/s: (was: 0.10.1) > Fix by-laws related to emeritus clause > --- > > Key: TEZ-3989 > URL: https://issues.apache.org/jira/browse/TEZ-3989 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Major > Fix For: 0.10.0 > > > The emeritus clause is not valid and needs to be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4066) Upgrade servlet-api from 2.5 to 3.1.0
[ https://issues.apache.org/jira/browse/TEZ-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4066: -- Fix Version/s: (was: 0.10.1) > Upgrade servlet-api from 2.5 to 3.1.0 > - > > Key: TEZ-4066 > URL: https://issues.apache.org/jira/browse/TEZ-4066 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-4066.001.patch, TEZ-4066.002.patch > > > Oozie launcher jobs trying to launch Tez jobs now fail to render Oozie > Launcher Job AM due to both 2.5 (from tez) and 3.1.0 (from hadoop) > servlet-api both being in the classpath. Tez should sync with servlet api > version from tez master branch that only supports hadoop 3+ > {code} > 2019-04-30 14:53:02,747 WARN [qtp1213419524-119] > org.eclipse.jetty.server.HttpChannel: > java.lang.NoSuchMethodError: > javax.servlet.http.HttpServletRequest.isAsyncStarted()Z > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:688) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > at > org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4172) Let tasks be killed after too many overall attempts
[ https://issues.apache.org/jira/browse/TEZ-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4172: -- Fix Version/s: (was: 0.10.1) > Let tasks be killed after too many overall attempts > --- > > Key: TEZ-4172 > URL: https://issues.apache.org/jira/browse/TEZ-4172 > Project: Apache Tez > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Fix For: 0.10.0, 0.9.3 > > Attachments: TEZ-4172.01.patch, TEZ-4172.02.patch > > > Currently, TaskImpl doesn't consider failing a task if there are too many > overall attempts. In case of LLAP, the number of preempted task attempts -> > overall task attempts [can grow in a > linkedhashmap|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L127]. > In an edge case, where an upstream application (Hive LLAP) cannot cope with a > problematic query, this can also lead to OOM in the AM, due the very high > number of TaskAttemptImpl objects. > It would be beneficial to have the chance to limit the overall number of task > attempts, regardless of they have been failed or killed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4068) Prevent new speculative attempt after task has issued canCommit to an attempt
[ https://issues.apache.org/jira/browse/TEZ-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4068: -- Fix Version/s: (was: 0.10.1) > Prevent new speculative attempt after task has issued canCommit to an attempt > - > > Key: TEZ-4068 > URL: https://issues.apache.org/jira/browse/TEZ-4068 > Project: Apache Tez > Issue Type: Improvement >Reporter: Jonathan Turner Eagles >Assignee: Ying Han >Priority: Major > Fix For: 0.10.0, 0.9.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When a running attempt calls TaskImpl#canCommit through the taskUmbilical, > the TaskImpl will issue a "go" if it is the first attempt to do so. Otherwise > it will issue a "no-go". After commitAttempt is assigned is TaskImpl, no > other attempt is allowed to succeed at that point. So a speculative attempt > that is launched after commitAttempt is assigned can never finished before > the original since is will allows be given a "no-go" in the canCommit > response. In this jira, I propose to discuss disabling speculative attempts > after commitAttempt has been assigned. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4041) TestExtServicesWithLocalMode fails in docker
[ https://issues.apache.org/jira/browse/TEZ-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4041: -- Fix Version/s: (was: 0.10.1) > TestExtServicesWithLocalMode fails in docker > > > Key: TEZ-4041 > URL: https://issues.apache.org/jira/browse/TEZ-4041 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4041.001.patch > > > {code} > 2019-02-13 00:24:33,703 INFO [DAGAppMaster Thread] service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.tez.dag.app.DAGAppMaster failed in state INITED > org.apache.tez.dag.api.TezUncheckedException: > java.lang.reflect.InvocationTargetException > at > org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215) > at > org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184) > at > org.apache.tez.dag.app.TaskCommunicatorManager.(TaskCommunicatorManager.java:152) > at > org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1088) > at > org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:532) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2606) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) > at > org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2603) > at org.apache.tez.client.LocalClient$1.run(LocalClient.java:327) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213) > ... 12 more > Caused by: java.lang.NullPointerException > at > org.apache.tez.test.service.rpc.TezTestServiceProtocolProtos$SubmitWorkRequestProto$Builder.setUser(TezTestServiceProtocolProtos.java:5549) > at > org.apache.tez.dag.app.taskcomm.TezTestServiceTaskCommunicatorImpl.(TezTestServiceTaskCommunicatorImpl.java:65) > ... 17 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4096) SSLFactory should pickup configs from incoming conf payload
[ https://issues.apache.org/jira/browse/TEZ-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4096: -- Fix Version/s: (was: 0.10.1) > SSLFactory should pickup configs from incoming conf payload > --- > > Key: TEZ-4096 > URL: https://issues.apache.org/jira/browse/TEZ-4096 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-4096.1.patch, TEZ-4096.2.patch, TEZ-4096.3.patch > > > SSLFactory uses "String" instead of "Path" for adding "ssl-client.xml". When > addResource is invoked with string, {{Configuration}} tries to find it in > classloader and does not load the file correctly. > [https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/http/SSLFactory.java#L107] > Conf: > [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L3064] > This creates issue when ssl-client.xml is located in different path other > than the classpath. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4047) Tez trademark in xml is causing xml parsing issue
[ https://issues.apache.org/jira/browse/TEZ-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4047: -- Fix Version/s: (was: 0.10.1) > Tez trademark in xml is causing xml parsing issue > - > > Key: TEZ-4047 > URL: https://issues.apache.org/jira/browse/TEZ-4047 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4047.001.patch > > > {code} > docs/src/site/site.xml: > [Fatal Error] site.xml:97:34: The entity "reg" was referenced, but not > declared. > java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: > file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: > 34; The entity "reg" was referenced, but not declared. > at > jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:397) > at > jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:449) > at > jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:406) > at > jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:402) > at > jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155) > at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264) > at com.sun.tools.script.shell.Main.evaluateString(Main.java:298) > at com.sun.tools.script.shell.Main.evaluateString(Main.java:319) > at com.sun.tools.script.shell.Main.access$300(Main.java:37) > at com.sun.tools.script.shell.Main$3.run(Main.java:217) > at com.sun.tools.script.shell.Main.main(Main.java:48) > Caused by: org.xml.sax.SAXParseException; systemId: > file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: > 34; The entity "reg" was referenced, but not declared. > at > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) > at > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205) > at > jdk.nashorn.internal.scripts.Script$Recompilation$2$19313A$\^system_init\_.XMLDocument(:747) > at jdk.nashorn.internal.scripts.Script$1$\^string\_.:program(:1) > at > jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:637) > at > jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494) > at > jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393) > ... 10 more > {code} > Also output from xmllint verifies xml issue as well. > {code} > xmllint ./docs/src/site/site.xml > .//src/site/site.xml:97: parser error : Entity 'reg' not defined > http://tez.apache.org/"/> > ^ > .//src/site/site.xml:123: parser error : Entity 'reg' not defined > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4204) Data race in RootInputInitializerManager
[ https://issues.apache.org/jira/browse/TEZ-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4204: -- Fix Version/s: (was: 0.10.1) > Data race in RootInputInitializerManager > > > Key: TEZ-4204 > URL: https://issues.apache.org/jira/browse/TEZ-4204 > Project: Apache Tez > Issue Type: Bug >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Blocker > Fix For: 0.10.0 > > Attachments: TEZ-4204.1.patch, TEZ-4204.1.patch, TEZ-4204.2.patch > > > After https://issues.apache.org/jira/browse/TEZ-4170 there is a data race for > initializerMap in RootInputInitializerManager. initializerMap should be > initialized before vertex state is set to initializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4156) Fix Tez to reuse IPC connections
[ https://issues.apache.org/jira/browse/TEZ-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4156: -- Fix Version/s: (was: 0.10.1) > Fix Tez to reuse IPC connections > > > Key: TEZ-4156 > URL: https://issues.apache.org/jira/browse/TEZ-4156 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-4156.1.patch, TEZ-4156.2.patch, TEZ-4156.3.patch, > TEZ-4156.4.patch > > > When tracking DAG progress, TezClientUtils ends up creating new remote user. > Because of this new UGI creation, IPC connections are not reused internally. > https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java#L965 > More info from Hadoop side: > In hadoop's IPC layer, connectionIds are checked based on > UserGroupInformation. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1600 > However, UserGroupInformation comparison is based on == > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1789 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4088) Create in-memory ifile writer for transferring smaller payloads (follow up of TEZ-4075)
[ https://issues.apache.org/jira/browse/TEZ-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4088: -- Fix Version/s: (was: 0.10.1) > Create in-memory ifile writer for transferring smaller payloads (follow up of > TEZ-4075) > --- > > Key: TEZ-4088 > URL: https://issues.apache.org/jira/browse/TEZ-4088 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-4088.1.patch, TEZ-4088.2.patch, TEZ-4088.3.patch, > TEZ-4088.5.patch, TEZ-4088.6.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > TEZ-4075 enabled data transfer over DME for smaller payloads. This helps in > reducing shuffle. > However, it still incurs disk IO cost (+flush) in producer side. It would be > good to retain smaller payloads in mem, so that disk IO costs can be saved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4040) Upgrade RoaringBitmap version to avoid NoSuchMethodError
[ https://issues.apache.org/jira/browse/TEZ-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4040: -- Fix Version/s: (was: 0.10.1) > Upgrade RoaringBitmap version to avoid NoSuchMethodError > > > Key: TEZ-4040 > URL: https://issues.apache.org/jira/browse/TEZ-4040 > Project: Apache Tez > Issue Type: Task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: 0.4.9.api.txt, 0.5.11.api.txt, 0.5.21.api.txt, > TEZ-4040.001.patch, TEZ-4040.002.patch > > > a common request is to use the runOptimize function which is present is later > versions of roaringbitmap -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4004) Update jetty9 to align with Hadoop and Hive
[ https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4004: -- Fix Version/s: (was: 0.10.1) > Update jetty9 to align with Hadoop and Hive > --- > > Key: TEZ-4004 > URL: https://issues.apache.org/jira/browse/TEZ-4004 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4004.001-branch-0.9.patch, TEZ-4004.001.patch > > > https://abi-laboratory.pro/index.php?view=timeline&lang=java&l=jetty > https://issues.apache.org/jira/browse/HADOOP-15815 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4206) TestSpeculation.testBasicSpeculationPerVertexConf is flaky
[ https://issues.apache.org/jira/browse/TEZ-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4206: -- Fix Version/s: (was: 0.10.1) > TestSpeculation.testBasicSpeculationPerVertexConf is flaky > -- > > Key: TEZ-4206 > URL: https://issues.apache.org/jira/browse/TEZ-4206 > Project: Apache Tez > Issue Type: Bug >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > Fix For: 0.10.0, 0.9.3 > > Attachments: TEZ-4206.1.patch > > > Test is flaky due to timing issue in MockDAGAppMaster's clock and > LegacySpeculator > [https://builds.apache.org/job/PreCommit-TEZ-Build/491/] > [https://builds.apache.org/job/PreCommit-TEZ-Build/492/] > [https://builds.apache.org/job/PreCommit-TEZ-Build/493/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4049) Fix findbugs issues in NotRunningJob
[ https://issues.apache.org/jira/browse/TEZ-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4049: -- Fix Version/s: (was: 0.10.1) > Fix findbugs issues in NotRunningJob > > > Key: TEZ-4049 > URL: https://issues.apache.org/jira/browse/TEZ-4049 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4049.001.patch > > > Introduced by TEZ-4035. Remove fixes while keeping 3.2.0 api compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4223) Adding new jars or resources after the first DAG runs does not work.
[ https://issues.apache.org/jira/browse/TEZ-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4223: -- Fix Version/s: (was: 0.10.1) > Adding new jars or resources after the first DAG runs does not work. > > > Key: TEZ-4223 > URL: https://issues.apache.org/jira/browse/TEZ-4223 > Project: Apache Tez > Issue Type: Bug >Reporter: Harish JP >Assignee: Harish JP >Priority: Major > Fix For: 0.10.0, 0.9.3 > > Attachments: TEZ-4223.02.patch, TEZ-4223.03.patch, TEZ-4223.04.patch > > > If we executed DAG which needs additional jars after the first DAG is run, we > get ClassNotFoundException. > > > {noformat} > 2020-08-03 13:57:14,776 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: > Added additional resources : > [[file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-pool-1.5.4.jar, > > file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/postgresql-42.2.8.jar, > > file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/hive-jdbc-handler-3.1.3000.7.2.2.0-73.jar, > > file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/mssql-jdbc-6.2.1.jre7.jar, > > file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-dbcp-1.4.jar]] > to classpath > org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find > class: org.apache.hive.storage.jdbc.JdbcInputFormat > Serialization trace: > inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc) > aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > ... > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.hive.storage.jdbc.JdbcInputFormat > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 46 more{noformat} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3972) Tez DAG can hang when a single task fails to fetch
[ https://issues.apache.org/jira/browse/TEZ-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3972: -- Fix Version/s: (was: 0.10.1) > Tez DAG can hang when a single task fails to fetch > -- > > Key: TEZ-3972 > URL: https://issues.apache.org/jira/browse/TEZ-3972 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-3972.001.patch, TEZ-3972.002.patch, > TEZ-3972.003.patch > > > Description of the hung DAG: > A DAG with 2 vertices. {{Map}} Vertex has 22k maps, downstream vertex > {{Reduce}} has 1009 tasks. All tasks succeed but one, which hangs. This one > task (attempt) is doing a local fetch from a node that (now) has a bad disk. > It fails to fetch and reports to the AM for the offending input attempt > identifiers. However the AM does not schedule a re-run as > {{uniquefailedOutputReports}} size is 1 (since only this task attempt failed > to fetch) and failure fraction is not met. The denominator for this fraction > is the total number of tasks. That causes the re-run to never occur. This > JIRA tracks the AM side of the change to alleviate this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4028) Events not visible from proto history logging for s3a filesystem until dag completes.
[ https://issues.apache.org/jira/browse/TEZ-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4028: -- Fix Version/s: (was: 0.10.1) > Events not visible from proto history logging for s3a filesystem until dag > completes. > - > > Key: TEZ-4028 > URL: https://issues.apache.org/jira/browse/TEZ-4028 > Project: Apache Tez > Issue Type: Bug >Reporter: Harish JP >Assignee: Harish JP >Priority: Major > Labels: history > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4028.01.patch, TEZ-4028.02.patch > > > The events are not visible in the files because s3 filesystem > * flush writes to local disk and only upload/commit to s3 on close. > * does not support append > As an initial fix we log the dag submitted, initialized and started events > into a file and these can be read to get the dag plan, config from the AM. > The counters are anyways not available until the dag completes. > The in-progress information cannot be read, this can be obtained from the AM > once we have the above events. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-3976) Batch ShuffleManager error report events
[ https://issues.apache.org/jira/browse/TEZ-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-3976: -- Fix Version/s: (was: 0.10.1) > Batch ShuffleManager error report events > > > Key: TEZ-3976 > URL: https://issues.apache.org/jira/browse/TEZ-3976 > Project: Apache Tez > Issue Type: Bug >Reporter: Jaume M >Assignee: Jaume M >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-3976.1.patch, TEZ-3976.2.patch, TEZ-3976.3.patch, > TEZ-3976.4.patch, TEZ-3976.5.patch, TEZ-3976.6.patch, TEZ-3976.7.patch, > TEZ-3976.8.patch, TEZ-3976.9.patch > > > The symptoms are a lot of these logs are being shown: > {code:java} > 2018-06-15T18:09:35,811 INFO [Fetcher_B {Reducer_5} #0 ()] > org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: > Fetch failed for src: InputAttemptIdentifier [inputIdentifier=701, > attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, > spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=701, > attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, > spillId=0], connectFailed: true > 2018-06-15T18:09:35,811 WARN [Fetcher_B {Reducer_5} #1 ()] > org.apache.tez.runtime.library.common.shuffle.Fetcher: copyInputs failed for > tasks [InputAttemptIdentifier [inputIdentifier=589, attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0]] > 2018-06-15T18:09:35,811 INFO [Fetcher_B {Reducer_5} #1 ()] > org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: > Fetch failed for src: InputAttemptIdentifier [inputIdentifier=589, > attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=589, > attemptNumber=0, > pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0], connectFailed: true > {code} > Each of those translate into an event in the AM which finally crashes due to > OOM after around 30 minutes and around 10 million shuffle input errors (and > 10 million lines like the previous ones). When the ShufflerManager is closed > and the counters reported there are many shuffle input errors, some of those > logs are: > {code:java} > 2018-06-15T17:46:30,988 INFO [TezTR-441963_21_34_4_0_4 > (152901963_0021_34_04_00_4)] runtime.LogicalIOProcessorRuntimeTask: > Final Counters for attempt_152901963_0021_34_04_00_4: Counters: 43 > [[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=0, > NUM_SHUFFLED_INPUTS=26, NUM_FAILED_SHUFFLE_INPUTS=858965, > INPUT_RECORDS_PROCESSED=26, OUTPUT_RECORDS=1, OUTPUT_LARGE_RECORDS=0, > OUTPUT_BYTES=779472, OUTPUT_BYTES_WITH_OVERHEAD=779483, > OUTPUT_BYTES_PHYSICAL=780146, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, > ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, > SHUFFLE_BYTES=4207563, SHUFFLE_BYTES_DECOMPRESSED=20266603, > SHUFFLE_BYTES_TO_MEM=3380616, SHUFFLE_BYTES_TO_DISK=0, > SHUFFLE_BYTES_DISK_DIRECT=826947, SHUFFLE_PHASE_TIME=52516, > FIRST_EVENT_RECEIVED=1, LAST_EVENT_RECEIVED=1185][HIVE > RECORDS_OUT_INTERMEDIATE_^[[1;35;40m^[[KReducer_12^[[m^[[K=1, > RECORDS_OUT_OPERATOR_GBY_159=1, > RECORDS_OUT_OPERATOR_RS_160=1][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_INPUT_Map_11 > FIRST_EVENT_RECEIVED=1, INPUT_RECORDS_PROCESSED=26, > LAST_EVENT_RECEIVED=1185, NUM_FAILED_SHUFFLE_INPUTS=858965, > NUM_SHUFFLED_INPUTS=26, SHUFFLE_BYTES=4207563, > SHUFFLE_BYTES_DECOMPRESSED=20266603, SHUFFLE_BYTES_DISK_DIRECT=826947, > SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=3380616, > SHUFFLE_PHASE_TIME=52516][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_OUTPUT_Map_1 > ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, > ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779472, OUTPUT_BYTES_PHYSICAL=780146, > OUTPUT_BYTES_WITH_OVERHEAD=779483, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, > SPILLED_RECORDS=0]] > 2018-06-15T17:46:32,271 INFO [TezTR-441963_21_34_3_15_1 ()] > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Final Counters for > attempt_152901963_0021_34_03_15_1: Counters: 87 [[File System > Counters FILE_BYTES_READ=0, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0, > FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=2344929, > HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=5, HDFS_LARGE_READ_OPS=0, > HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter > SPILLED_RECORDS=0, NUM_SHUFFLED_INPUTS=1, NUM_FAILED_SHUFFLE_INPUTS=105195, > INPUT_RECORDS_PROCESSED=397, INPUT_SPLIT_LENGTH_BYTES=21563271, > OUTPUT_RECORDS=15737, OUTPUT_LARGE_RECORDS=0, OUTPUT_BYTES=1235818, > OUTPUT_BYTES_WITH_OVERHEAD=1267307, OUTP
[jira] [Updated] (TEZ-4036) TestMockDAGAppMaster#testInternalPreemption should assert for failed state
[ https://issues.apache.org/jira/browse/TEZ-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4036: -- Fix Version/s: (was: 0.10.1) > TestMockDAGAppMaster#testInternalPreemption should assert for failed state > -- > > Key: TEZ-4036 > URL: https://issues.apache.org/jira/browse/TEZ-4036 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4036.001.patch > > > Due to root cause mentioned in TEZ-3950, the test fails regularly. Until the > fix for that JIRA is in (which is rather a good amount of redesign) , adding > failed assert to the test as this is now an expected state for the task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4086) Some Tez examples cannot work with outputPaths on a FS other than the default FS
[ https://issues.apache.org/jira/browse/TEZ-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4086: -- Fix Version/s: (was: 0.10.1) > Some Tez examples cannot work with outputPaths on a FS other than the default > FS > > > Key: TEZ-4086 > URL: https://issues.apache.org/jira/browse/TEZ-4086 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-4086.01.txt, TEZ-4086.02.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > There's several examples which make use of the FileSystem based on the > default config. > This results in failure if the outputPath is on a different FileSystem. (e.g. > fs.defaultFS set to HDFS and outputPath for the example set to s3) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4058) Changes for 0.9.2 release
[ https://issues.apache.org/jira/browse/TEZ-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4058: -- Fix Version/s: (was: 0.10.1) > Changes for 0.9.2 release > - > > Key: TEZ-4058 > URL: https://issues.apache.org/jira/browse/TEZ-4058 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Fix For: 0.10.0 > > Attachments: TEZ-4058.001.patch > > > Update Tez_DOAP.rtf, index.md etc. for 0.9.2 release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-1348) Allow Tez local mode to run against filesystems other than local FS
[ https://issues.apache.org/jira/browse/TEZ-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-1348: -- Fix Version/s: (was: 0.10.1) > Allow Tez local mode to run against filesystems other than local FS > --- > > Key: TEZ-1348 > URL: https://issues.apache.org/jira/browse/TEZ-1348 > Project: Apache Tez > Issue Type: Sub-task > Environment: Committed to branch-0.9. >Reporter: Siddharth Seth >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.9.2, 0.10.0 > > Attachments: tez-1348.patch, tez-1348.patch, tez-1348.txt > > Time Spent: 20m > Remaining Estimate: 0h > > In TEZ-717, I incorrect thought setting fs.defaultFS programmatically in > tez-site would work for local mode. > Currently the requirement is that tez-site.xml must have fs.defaultFS set to > file:///. > While that works, it doesn't allow for seamless execution in either > local-mode or on a cluster. > The main issue here is that when Inputs / Outputs are configured - they use a > version of configuration which reads tez-site, and do not use the > configuration from the client itself (which is correct behaviour). > Not sure what a good way to fix this is > 1) It may be possible to override this value each time an instance of > Configuration/TezConfiguration is created. One possible way would be to > statically add a default resource to Configuration the moment a local client > is created. > 2) Provide information in the contexts on whether this is local or not. This > is fairly ugly, and would get in the way of running mixed mode tasks. > Anyone have other suggestions ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TEZ-4012) Add docker support for Tez.
[ https://issues.apache.org/jira/browse/TEZ-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated TEZ-4012: -- Fix Version/s: (was: 0.10.1) > Add docker support for Tez. > --- > > Key: TEZ-4012 > URL: https://issues.apache.org/jira/browse/TEZ-4012 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Fix For: 0.9.2, 0.10.0 > > Attachments: TEZ-4012.001.patch, TEZ-4012.002.patch, > TEZ-4012.003.patch > > > Hadoop label builds contain a mix of development tools and versions. In > particular H11-H20 are unusable by tez since protoc -version is 2.6.x and > hadoop only supports 2.5.0. This jira will allow builds across all H1-H20 > jenkins machines. -- This message was sent by Atlassian Jira (v8.3.4#803005)