[jira] [Commented] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager
[ https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682259#comment-16682259 ] Yuming Wang commented on SPARK-26000: - It is not a Spark issue, Maybe you need to increase {{dfs.datanode.handler.count}}. > Missing block when reading HDFS Data from Cloudera Manager > -- > > Key: SPARK-26000 > URL: https://issues.apache.org/jira/browse/SPARK-26000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.2 >Reporter: john >Priority: Major > > I am able to write to Cloudera Manager HDFS through Open Source Spark which > runs separately. but not able to read the Cloudera Manger HDFS data . > > I am getting missing block location, socketTimeOut. > > spark.read().textfile(path_to_file) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager
[ https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682163#comment-16682163 ] john commented on SPARK-26000: -- I have Cloudera Manager in Environment A which has HDFS component and Spark in B. I am doing a very sample read and write to/from HDFS. Writing to HDFS Cloudera Manager is working as expected when reading back i m getting below issues: "java.lang.reflect.InvocationTargetException" Caused By: "org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;" Caused By: "java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/SparkNode_IP_PORT_NoO remote=/NameNode:50010:" java Sample code // writing spark.write().mode("append").format("parquet").save(path_to_file); // read spark.read().parquet(path_to_file); > Missing block when reading HDFS Data from Cloudera Manager > -- > > Key: SPARK-26000 > URL: https://issues.apache.org/jira/browse/SPARK-26000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.2 >Reporter: john >Priority: Major > > I am able to write to Cloudera Manager HDFS through Open Source Spark which > runs separately. but not able to read the Cloudera Manger HDFS data . > > I am getting missing block location, socketTimeOut. > > spark.read().textfile(path_to_file) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager
[ https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682151#comment-16682151 ] Yuming Wang commented on SPARK-26000: - Could you provide more information? > Missing block when reading HDFS Data from Cloudera Manager > -- > > Key: SPARK-26000 > URL: https://issues.apache.org/jira/browse/SPARK-26000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.2 >Reporter: john >Priority: Major > > I am able to write to Cloudera Manager HDFS through Open Source Spark which > runs separately. but not able to read the Cloudera Manger HDFS data . > > I am getting missing block location, socketTimeOut. > > spark.read().textfile(path_to_file) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org