[ 
https://issues.apache.org/jira/browse/TEZ-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781260#comment-17781260
 ] 

Ayush Saxena commented on TEZ-4415:
-----------------------------------

This is reproducible for me as well

{code:java}
[hdfs@ayushsaxena-3 root]$ hdfs dfs -ls /dataq.har
Found 2 items
-rw-r--r--   2 hdfs supergroup          0 2023-10-31 07:28 /dataq.har/_SUCCESS
-rw-r--r--   3 hdfs supergroup          0 2023-10-31 07:28 /dataq.har/part-0
[hdfs@ayushsaxena-3 root]$ hdfs dfs -ls har:/dataq.har
23/10/31 07:29:21 WARN fs.FileSystem: Failed to initialize fileystem 
har:/dataq.har: java.io.IOException: Invalid path for the Har Filesystem. No 
index file in har:/dataq.har
ls: Invalid path for the Har Filesystem. No index file in har:/dataq.har
{code}

Looks like the reducer isn't running in case of tez...

> Hadoop archives created with Tez miss index files
> -------------------------------------------------
>
>                 Key: TEZ-4415
>                 URL: https://issues.apache.org/jira/browse/TEZ-4415
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.2
>            Reporter: Christophe Préaud
>            Priority: Minor
>
> When a hadoop archive is created with Tez, the _index and _masterindex files 
> are not created:
> {code:java}
> # create hadoop archive with Tez
> hadoop archive -D mapreduce.framework.name=yarn-tez -archiveName data.har -p 
> /user/preaudc/data /user/preaudc 
> (...)
> 22/05/23 13:04:39 INFO client.TezClient: Tez Client Version: [ 
> component=tez-api, version=0.9.2, 
> revision=10cb3519bd34389210e6511a2ba291b52dcda081, 
> SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, 
> buildTime=2019-03-19T20:44:07Z ]
> (...)
> # _index and _masterindex files are not created
> hdfs dfs -ls /user/preaudc/data.har
> Found 2 items
> -rw-r--r--   3 preaudc preaudc          0 2022-05-23 13:06 
> /user/preaudc/data.har/_SUCCESS
> -rw-r--r--   3 preaudc preaudc 2537147461 2022-05-23 13:06 
> /user/preaudc/data.har/part-0
> # the hadoop archive is thus unreadable
> hdfs dfs -ls har:/user/preaudc/data.har
> ls: Invalid path for the Har Filesystem. No index file in 
> har:/user/preaudc/data.har{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to