[jira] [Commented] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager

2018-11-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682259#comment-16682259
 ] 

Yuming Wang commented on SPARK-26000:
-

It is not a Spark issue, Maybe you need to increase 
{{dfs.datanode.handler.count}}.

> Missing block when reading HDFS Data from Cloudera Manager
> --
>
> Key: SPARK-26000
> URL: https://issues.apache.org/jira/browse/SPARK-26000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2
>Reporter: john
>Priority: Major
>
> I am able to write to Cloudera Manager HDFS through Open Source Spark which 
> runs separately. but not able to read the Cloudera Manger HDFS data .
>  
> I am getting missing block location, socketTimeOut.
>  
> spark.read().textfile(path_to_file)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager

2018-11-09 Thread john (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682163#comment-16682163
 ] 

john commented on SPARK-26000:
--

I have Cloudera Manager in Environment A which has HDFS component and Spark in 
B. I am doing a very sample read and write to/from HDFS. Writing to HDFS 
Cloudera Manager is working as expected when reading back i m getting below 
issues:

 

"java.lang.reflect.InvocationTargetException" Caused By: 
"org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It 
must be specified manually.;"

Caused By: "java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/SparkNode_IP_PORT_NoO 
remote=/NameNode:50010:"

java Sample code

 

// writing 

spark.write().mode("append").format("parquet").save(path_to_file);

// read

spark.read().parquet(path_to_file);

 

 

 

 

> Missing block when reading HDFS Data from Cloudera Manager
> --
>
> Key: SPARK-26000
> URL: https://issues.apache.org/jira/browse/SPARK-26000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2
>Reporter: john
>Priority: Major
>
> I am able to write to Cloudera Manager HDFS through Open Source Spark which 
> runs separately. but not able to read the Cloudera Manger HDFS data .
>  
> I am getting missing block location, socketTimeOut.
>  
> spark.read().textfile(path_to_file)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager

2018-11-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682151#comment-16682151
 ] 

Yuming Wang commented on SPARK-26000:
-

Could you provide more information?

> Missing block when reading HDFS Data from Cloudera Manager
> --
>
> Key: SPARK-26000
> URL: https://issues.apache.org/jira/browse/SPARK-26000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2
>Reporter: john
>Priority: Major
>
> I am able to write to Cloudera Manager HDFS through Open Source Spark which 
> runs separately. but not able to read the Cloudera Manger HDFS data .
>  
> I am getting missing block location, socketTimeOut.
>  
> spark.read().textfile(path_to_file)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org