[
https://issues.apache.org/jira/browse/TEZ-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799281#comment-17799281
]
Ayush Saxena commented on TEZ-4415:
-----------------------------------
Last time I tried on our internal build. Today I tried on Hadoop-3.3.6 &
Tez-0.10.2 & *it works*
{noformat}
ayushsaxena@ayushsaxena apache-tez-0.10.2-bin % $HADOOP_HOME/bin/hadoop archive
-D mapreduce.framework.name=yarn-tez -archiveName abc.har -p /out /haro
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/ayushsaxena/code/cluster/hadoop-3.3.6/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/ayushsaxena/code/cluster/apache-tez-0.10.2-bin/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
2023-12-21 11:48:49,203 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2023-12-21 11:48:49,667 INFO client.DefaultNoHARMFailoverProxyProvider:
Connecting to ResourceManager at /0.0.0.0:8032
2023-12-21 11:48:49,931 INFO client.DefaultNoHARMFailoverProxyProvider:
Connecting to ResourceManager at /0.0.0.0:8032
2023-12-21 11:48:49,944 INFO client.DefaultNoHARMFailoverProxyProvider:
Connecting to ResourceManager at /0.0.0.0:8032
2023-12-21 11:48:50,011 INFO mapreduce.JobResourceUploader: Disabling Erasure
Coding for path:
/tmp/hadoop-yarn/staging/ayushsaxena/.staging/job_1703099201534_0008
2023-12-21 11:48:50,524 INFO mapreduce.JobSubmitter: number of splits:1
2023-12-21 11:48:50,607 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1703099201534_0008
2023-12-21 11:48:50,607 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-12-21 11:48:50,685 INFO client.YARNRunner: Number of stages: 2
2023-12-21 11:48:50,704 INFO conf.Configuration: resource-types.xml not found
2023-12-21 11:48:50,704 INFO resource.ResourceUtils: Unable to find
'resource-types.xml'.
2023-12-21 11:48:50,968 INFO counters.Limits: Counter limits initialized with
parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64,
MAX_COUNTERS=1200
2023-12-21 11:48:50,968 INFO counters.Limits: Counter limits initialized with
parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64,
MAX_COUNTERS=120
2023-12-21 11:48:50,968 INFO client.TezClient: Tez Client Version: [
component=tez-api, version=0.10.2,
revision=72977b8720b2337ab0a0a3bf3b12e1c57900fa69,
SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git,
buildTime=2022-07-08T16:21:10Z, buildUser=laszlobodor,
buildJavaVersion=1.8.0_292 ]
2023-12-21 11:48:50,976 INFO client.DefaultNoHARMFailoverProxyProvider:
Connecting to ResourceManager at /0.0.0.0:8032
2023-12-21 11:48:50,977 INFO client.TezClient: Submitting DAG application with
id: application_1703099201534_0008
2023-12-21 11:48:50,978 INFO client.TezClientUtils: Using tez.lib.uris value
from configuration: hdfs://localhost:52714/apps/tez/tez.tar.gz
2023-12-21 11:48:50,978 INFO client.TezClientUtils: Using
tez.lib.uris.classpath value from configuration: null
2023-12-21 11:48:50,986 INFO client.TezClient: Tez system stage directory
hdfs://localhost:52714/tmp/hadoop-yarn/staging/ayushsaxena/.staging/job_1703099201534_0008/.tez/application_1703099201534_0008
doesn't exist and is created
2023-12-21 11:48:51,075 INFO client.TezClient: Submitting DAG to YARN,
applicationId=application_1703099201534_0008, dagName=hadoop-archives-3.3.6.jar
2023-12-21 11:48:51,103 INFO impl.YarnClientImpl: Submitted application
application_1703099201534_0008
2023-12-21 11:48:51,105 INFO client.TezClient: The url to track the Tez AM:
http://localhost:8088/proxy/application_1703099201534_0008/
2023-12-21 11:48:54,254 INFO client.DefaultNoHARMFailoverProxyProvider:
Connecting to ResourceManager at /0.0.0.0:8032
2023-12-21 11:48:54,274 INFO mapreduce.Job: The url to track the job:
http://localhost:8088/proxy/application_1703099201534_0008/
2023-12-21 11:48:54,275 INFO mapreduce.Job: Running job: job_1703099201534_0008
2023-12-21 11:48:55,292 INFO mapreduce.Job: Job job_1703099201534_0008 running
in uber mode : false
2023-12-21 11:48:55,296 INFO mapreduce.Job: map 0% reduce 0%
2023-12-21 11:48:58,326 INFO mapreduce.Job: map 100% reduce 100%
2023-12-21 11:48:58,330 INFO mapreduce.Job: Job job_1703099201534_0008
completed successfully
2023-12-21 11:48:58,339 INFO mapreduce.Job: Counters: 0
ayushsaxena@ayushsaxena apache-tez-0.10.2-bin % hdfs dfs -ls /haro/abc.har
zsh: command not found: hdfs
ayushsaxena@ayushsaxena apache-tez-0.10.2-bin % $HADOOP_HOME/bin/hdfs dfs -ls
/haro/abc.har
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/ayushsaxena/code/cluster/hadoop-3.3.6/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/ayushsaxena/code/cluster/apache-tez-0.10.2-bin/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
2023-12-21 11:49:53,241 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
Found 2 items
-rw-r--r-- 1 ayushsaxena supergroup 0 2023-12-21 11:48
/haro/abc.har/_SUCCESS
-rw-r--r-- 3 ayushsaxena supergroup 8179 2023-12-21 11:48
/haro/abc.har/part-0{noformat}
> Hadoop archives created with Tez miss index files
> -------------------------------------------------
>
> Key: TEZ-4415
> URL: https://issues.apache.org/jira/browse/TEZ-4415
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.9.2
> Reporter: Christophe Préaud
> Priority: Minor
>
> When a hadoop archive is created with Tez, the _index and _masterindex files
> are not created:
> {code:java}
> # create hadoop archive with Tez
> hadoop archive -D mapreduce.framework.name=yarn-tez -archiveName data.har -p
> /user/preaudc/data /user/preaudc
> (...)
> 22/05/23 13:04:39 INFO client.TezClient: Tez Client Version: [
> component=tez-api, version=0.9.2,
> revision=10cb3519bd34389210e6511a2ba291b52dcda081,
> SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git,
> buildTime=2019-03-19T20:44:07Z ]
> (...)
> # _index and _masterindex files are not created
> hdfs dfs -ls /user/preaudc/data.har
> Found 2 items
> -rw-r--r-- 3 preaudc preaudc 0 2022-05-23 13:06
> /user/preaudc/data.har/_SUCCESS
> -rw-r--r-- 3 preaudc preaudc 2537147461 2022-05-23 13:06
> /user/preaudc/data.har/part-0
> # the hadoop archive is thus unreadable
> hdfs dfs -ls har:/user/preaudc/data.har
> ls: Invalid path for the Har Filesystem. No index file in
> har:/user/preaudc/data.har{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)