Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-17 Thread Divya Gehlot
Can you please check order of all the data set of union all operations.


Are they in same order ?

On 9 August 2016 at 02:47, max square  wrote:

> Hey guys,
>
> I'm trying to save Dataframe in CSV format after performing unionAll
> operations on it.
> But I get this exception -
>
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
> execute, tree:
> TungstenExchange hashpartitioning(mId#430,200)
>
> I'm saving it by
>
> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir +
> latest)
>
> It works perfectly if I don't do the unionAll operation.
> I see that the format isn't different by printing the part of the results.
>
> Any help regarding this would be appreciated.
>
>


Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-17 Thread max square
Thanks Harsh for the reply.

When I change the code to something like this -

  def saveAsLatest(df: DataFrame, fileSystem: FileSystem, bakDir: String) =
{

fileSystem.rename(new Path(bakDir + latest), new Path(bakDir + "/" +
ScalaUtil.currentDateTimeString))

fileSystem.create(new Path(bakDir + latest))


df.write.format("com.databricks.spark.csv").mode("Overwrite").options(Map("mode"
-> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir +
latest)

  }

It still shows the same error. Also, if I don't do the unionAll operation,
the operation is successful with the previous code.

Anything else that I should be checking?




On Wed, Aug 17, 2016 at 1:32 PM, HARSH TAKKAR  wrote:

> Hi
>
> I can see that exception is caused by following, csn you check where in
> your code you are using this path
>
> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path
> does not exist: hdfs://testcluster:8020/experiments/vol/spark_chomp_
> data/bak/restaurants-bak/latest
>
> On Wed, 17 Aug 2016, 10:57 p.m. max square, 
> wrote:
>
>> /bump
>>
>> It'd be great if someone can point me to the correct direction.
>>
>> On Mon, Aug 8, 2016 at 5:07 PM, max square 
>> wrote:
>>
>>> Here's the complete stacktrace - https://gist.github.com/rohann/
>>> 649b0fcc9d5062ef792eddebf5a315c1
>>>
>>> For reference, here's the complete function -
>>>
>>>   def saveAsLatest(df: DataFrame, fileSystem: FileSystem, bakDir:
>>> String) = {
>>>
>>> fileSystem.rename(new Path(bakDir + latest), new Path(bakDir + "/"
>>> + ScalaUtil.currentDateTimeString))
>>>
>>> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
>>> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir
>>> + latest)
>>>
>>>   }
>>>
>>> On Mon, Aug 8, 2016 at 3:41 PM, Ted Yu  wrote:
>>>
 Mind showing the complete stack trace ?

 Thanks

 On Mon, Aug 8, 2016 at 12:30 PM, max square 
 wrote:

> Thanks Ted for the prompt reply.
>
> There are three or four DFs that are coming from various sources and
> I'm doing a unionAll on them.
>
> val placesProcessed = placesUnchanged.unionAll(place
> sAddedWithMerchantId).unionAll(placesUpdatedFromHotelsWithMerchantId
> ).unionAll(placesUpdatedFromRestaurantsWithMerchantId).unionAll(
> placesChanged)
>
> I'm using Spark 1.6.2.
>
> On Mon, Aug 8, 2016 at 3:11 PM, Ted Yu  wrote:
>
>> Can you show the code snippet for unionAll operation ?
>>
>> Which Spark release do you use ?
>>
>> BTW please use user@spark.apache.org in the future.
>>
>> On Mon, Aug 8, 2016 at 11:47 AM, max square 
>> wrote:
>>
>>> Hey guys,
>>>
>>> I'm trying to save Dataframe in CSV format after performing unionAll
>>> operations on it.
>>> But I get this exception -
>>>
>>> Exception in thread "main" org.apache.spark.sql.catalyst.
>>> errors.package$TreeNodeException: execute, tree:
>>> TungstenExchange hashpartitioning(mId#430,200)
>>>
>>> I'm saving it by
>>>
>>> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
>>> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir
>>> + latest)
>>>
>>> It works perfectly if I don't do the unionAll operation.
>>> I see that the format isn't different by printing the part of the
>>> results.
>>>
>>> Any help regarding this would be appreciated.
>>>
>>>
>>
>

>>>
>>


Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-17 Thread HARSH TAKKAR
Hi

I can see that exception is caused by following, csn you check where in
your code you are using this path

Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist:
hdfs://testcluster:8020/experiments/vol/spark_chomp_data/bak/restaurants-bak/latest

On Wed, 17 Aug 2016, 10:57 p.m. max square,  wrote:

> /bump
>
> It'd be great if someone can point me to the correct direction.
>
> On Mon, Aug 8, 2016 at 5:07 PM, max square 
> wrote:
>
>> Here's the complete stacktrace -
>> https://gist.github.com/rohann/649b0fcc9d5062ef792eddebf5a315c1
>>
>> For reference, here's the complete function -
>>
>>   def saveAsLatest(df: DataFrame, fileSystem: FileSystem, bakDir:
>> String) = {
>>
>> fileSystem.rename(new Path(bakDir + latest), new Path(bakDir + "/" +
>> ScalaUtil.currentDateTimeString))
>>
>> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
>> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir +
>> latest)
>>
>>   }
>>
>> On Mon, Aug 8, 2016 at 3:41 PM, Ted Yu  wrote:
>>
>>> Mind showing the complete stack trace ?
>>>
>>> Thanks
>>>
>>> On Mon, Aug 8, 2016 at 12:30 PM, max square 
>>> wrote:
>>>
 Thanks Ted for the prompt reply.

 There are three or four DFs that are coming from various sources and
 I'm doing a unionAll on them.

 val placesProcessed = placesUnchanged.unionAll(
 placesAddedWithMerchantId).unionAll(
 placesUpdatedFromHotelsWithMerchantId).unionAll(
 placesUpdatedFromRestaurantsWithMerchantId).unionAll(placesChanged)

 I'm using Spark 1.6.2.

 On Mon, Aug 8, 2016 at 3:11 PM, Ted Yu  wrote:

> Can you show the code snippet for unionAll operation ?
>
> Which Spark release do you use ?
>
> BTW please use user@spark.apache.org in the future.
>
> On Mon, Aug 8, 2016 at 11:47 AM, max square 
> wrote:
>
>> Hey guys,
>>
>> I'm trying to save Dataframe in CSV format after performing unionAll
>> operations on it.
>> But I get this exception -
>>
>> Exception in thread "main"
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
>> tree:
>> TungstenExchange hashpartitioning(mId#430,200)
>>
>> I'm saving it by
>>
>> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
>> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir
>> + latest)
>>
>> It works perfectly if I don't do the unionAll operation.
>> I see that the format isn't different by printing the part of the
>> results.
>>
>> Any help regarding this would be appreciated.
>>
>>
>

>>>
>>
>


Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-17 Thread max square
/bump

It'd be great if someone can point me to the correct direction.

On Mon, Aug 8, 2016 at 5:07 PM, max square  wrote:

> Here's the complete stacktrace - https://gist.github.com/rohann/
> 649b0fcc9d5062ef792eddebf5a315c1
>
> For reference, here's the complete function -
>
>   def saveAsLatest(df: DataFrame, fileSystem: FileSystem, bakDir: String)
> = {
>
> fileSystem.rename(new Path(bakDir + latest), new Path(bakDir + "/" +
> ScalaUtil.currentDateTimeString))
>
> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir +
> latest)
>
>   }
>
> On Mon, Aug 8, 2016 at 3:41 PM, Ted Yu  wrote:
>
>> Mind showing the complete stack trace ?
>>
>> Thanks
>>
>> On Mon, Aug 8, 2016 at 12:30 PM, max square 
>> wrote:
>>
>>> Thanks Ted for the prompt reply.
>>>
>>> There are three or four DFs that are coming from various sources and I'm
>>> doing a unionAll on them.
>>>
>>> val placesProcessed = placesUnchanged.unionAll(placesAddedWithMerchantId
>>> ).unionAll(placesUpdatedFromHotelsWithMerchantId).unionAll(placesUpdat
>>> edFromRestaurantsWithMerchantId).unionAll(placesChanged)
>>>
>>> I'm using Spark 1.6.2.
>>>
>>> On Mon, Aug 8, 2016 at 3:11 PM, Ted Yu  wrote:
>>>
 Can you show the code snippet for unionAll operation ?

 Which Spark release do you use ?

 BTW please use user@spark.apache.org in the future.

 On Mon, Aug 8, 2016 at 11:47 AM, max square 
 wrote:

> Hey guys,
>
> I'm trying to save Dataframe in CSV format after performing unionAll
> operations on it.
> But I get this exception -
>
> Exception in thread "main" org.apache.spark.sql.catalyst.
> errors.package$TreeNodeException: execute, tree:
> TungstenExchange hashpartitioning(mId#430,200)
>
> I'm saving it by
>
> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir
> + latest)
>
> It works perfectly if I don't do the unionAll operation.
> I see that the format isn't different by printing the part of the
> results.
>
> Any help regarding this would be appreciated.
>
>

>>>
>>
>


Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-08 Thread Ted Yu
Mind showing the complete stack trace ?

Thanks

On Mon, Aug 8, 2016 at 12:30 PM, max square  wrote:

> Thanks Ted for the prompt reply.
>
> There are three or four DFs that are coming from various sources and I'm
> doing a unionAll on them.
>
> val placesProcessed = placesUnchanged.unionAll(placesAddedWithMerchantId).
> unionAll(placesUpdatedFromHotelsWithMerchantId).unionAll(pla
> cesUpdatedFromRestaurantsWithMerchantId).unionAll(placesChanged)
>
> I'm using Spark 1.6.2.
>
> On Mon, Aug 8, 2016 at 3:11 PM, Ted Yu  wrote:
>
>> Can you show the code snippet for unionAll operation ?
>>
>> Which Spark release do you use ?
>>
>> BTW please use user@spark.apache.org in the future.
>>
>> On Mon, Aug 8, 2016 at 11:47 AM, max square 
>> wrote:
>>
>>> Hey guys,
>>>
>>> I'm trying to save Dataframe in CSV format after performing unionAll
>>> operations on it.
>>> But I get this exception -
>>>
>>> Exception in thread "main" org.apache.spark.sql.catalyst.
>>> errors.package$TreeNodeException: execute, tree:
>>> TungstenExchange hashpartitioning(mId#430,200)
>>>
>>> I'm saving it by
>>>
>>> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
>>> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir
>>> + latest)
>>>
>>> It works perfectly if I don't do the unionAll operation.
>>> I see that the format isn't different by printing the part of the
>>> results.
>>>
>>> Any help regarding this would be appreciated.
>>>
>>>
>>
>


Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-08 Thread Ted Yu
Can you show the code snippet for unionAll operation ?

Which Spark release do you use ?

BTW please use user@spark.apache.org in the future.

On Mon, Aug 8, 2016 at 11:47 AM, max square  wrote:

> Hey guys,
>
> I'm trying to save Dataframe in CSV format after performing unionAll
> operations on it.
> But I get this exception -
>
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
> execute, tree:
> TungstenExchange hashpartitioning(mId#430,200)
>
> I'm saving it by
>
> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir +
> latest)
>
> It works perfectly if I don't do the unionAll operation.
> I see that the format isn't different by printing the part of the results.
>
> Any help regarding this would be appreciated.
>
>