[ 
https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682163#comment-16682163
 ] 

john commented on SPARK-26000:
------------------------------

I have Cloudera Manager in Environment A which has HDFS component and Spark in 
B. I am doing a very sample read and write to/from HDFS. Writing to HDFS 
Cloudera Manager is working as expected when reading back i m getting below 
issues:

 

"java.lang.reflect.InvocationTargetException" Caused By: 
"org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It 
must be specified manually.;"

Caused By: "java.net.SocketTimeoutException: 60000 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/SparkNode_IP_PORT_NoO 
remote=/NameNode:50010:"

java Sample code

 

// writing 

spark.write().mode("append").format("parquet").save(path_to_file);

// read

spark.read().parquet(path_to_file);

 

 

 

 

> Missing block when reading HDFS Data from Cloudera Manager
> ----------------------------------------------------------
>
>                 Key: SPARK-26000
>                 URL: https://issues.apache.org/jira/browse/SPARK-26000
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.2
>            Reporter: john
>            Priority: Major
>
> I am able to write to Cloudera Manager HDFS through Open Source Spark which 
> runs separately. but not able to read the Cloudera Manger HDFS data .
>  
> I am getting missing block location, socketTimeOut.
>  
> spark.read().textfile(path_to_file)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to