[ https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682163#comment-16682163 ]
john commented on SPARK-26000: ------------------------------ I have Cloudera Manager in Environment A which has HDFS component and Spark in B. I am doing a very sample read and write to/from HDFS. Writing to HDFS Cloudera Manager is working as expected when reading back i m getting below issues: "java.lang.reflect.InvocationTargetException" Caused By: "org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;" Caused By: "java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/SparkNode_IP_PORT_NoO remote=/NameNode:50010:" java Sample code // writing spark.write().mode("append").format("parquet").save(path_to_file); // read spark.read().parquet(path_to_file); > Missing block when reading HDFS Data from Cloudera Manager > ---------------------------------------------------------- > > Key: SPARK-26000 > URL: https://issues.apache.org/jira/browse/SPARK-26000 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.2 > Reporter: john > Priority: Major > > I am able to write to Cloudera Manager HDFS through Open Source Spark which > runs separately. but not able to read the Cloudera Manger HDFS data . > > I am getting missing block location, socketTimeOut. > > spark.read().textfile(path_to_file) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org