Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Abhijeet Kumar
The situation is completely different than what you are thinking. Ok,
thanks for your time. From now I'll figure this out myself. Thank you again!

On Sat, 9 Jun 2018, 13:27 Jörn Franke,  wrote:

> Why don’t you write the final name from the start?
> Ie save as the file it should be named.
>
> On 9. Jun 2018, at 09:44, Abhijeet Kumar 
> wrote:
>
> I need to rename the file. I can write a separate program for this, I
> think.
>
> Thanks,
> Abhijeet Kumar
>
> On 09-Jun-2018, at 1:10 PM, Jörn Franke  wrote:
>
> That would be an anti pattern and would lead to bad software.
> Please don’t do it for the sake of the people that use your software.
> What do you exactly want to achieve with the information if the file
> exists or not?
>
> On 9. Jun 2018, at 08:34, Abhijeet Kumar 
> wrote:
>
> Can you please tell the estimated time. So, that my program will wait for
> that time period.
>
> Thanks,
> Abhijeet Kumar
>
> On 09-Jun-2018, at 12:01 PM, Jörn Franke  wrote:
>
> You need some time until the information of the file creation is
> propagated.
>
> On 9. Jun 2018, at 08:07, Abhijeet Kumar 
> wrote:
>
> I'm modifying a CSV file which is inside HDFS and finally putting it back
> to HDFS in Spark.
>
> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
> csv_file.coalesce(1).write
>   .format("csv”)
>   .mode("overwrite”)
>   .save("hdfs://localhost:8020/data/temp_insight”)Thread.sleep(15000)
> println(fs.exists(new Path("/data/temp_insight")))
>
> Output:
>
> false
>
> while I have stopped the thread for 15 sec, I have checked my hdfs using
> command
>
> hdfs dfs -ls /data/temp_insight
>
> Output:
>
> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where 
> applicable-rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
> /data/temp_insight/_SUCCESS-rw-r--r--   3 abhijeet supergroup201 
> 2018-06-08 17:48 
> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
>
> To cross verify whether it is taking the path of hdfs or not I have added
> one more println statement in my code, providing the path which is already
> there in HDFS. It's showing true in that case.
>
> So, what could be the reason?
> Thanks,
>
> Abhijeet Kumar
>
>
>
>


Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Jörn Franke
Why don’t you write the final name from the start?
Ie save as the file it should be named.

> On 9. Jun 2018, at 09:44, Abhijeet Kumar  wrote:
> 
> I need to rename the file. I can write a separate program for this, I think.
> 
> Thanks,
> Abhijeet Kumar 
>> On 09-Jun-2018, at 1:10 PM, Jörn Franke  wrote:
>> 
>> That would be an anti pattern and would lead to bad software.
>> Please don’t do it for the sake of the people that use your software.
>> What do you exactly want to achieve with the information if the file exists 
>> or not?
>> 
>>> On 9. Jun 2018, at 08:34, Abhijeet Kumar  
>>> wrote:
>>> 
>>> Can you please tell the estimated time. So, that my program will wait for 
>>> that time period.
>>> 
>>> Thanks,
>>> Abhijeet Kumar
 On 09-Jun-2018, at 12:01 PM, Jörn Franke  wrote:
 
 You need some time until the information of the file creation is 
 propagated.
 
> On 9. Jun 2018, at 08:07, Abhijeet Kumar  
> wrote:
> 
> I'm modifying a CSV file which is inside HDFS and finally putting it back 
> to HDFS in Spark.
> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
> csv_file.coalesce(1).write
>   .format("csv”)
>   .mode("overwrite”)
>   .save("hdfs://localhost:8020/data/temp_insight”)
> Thread.sleep(15000)
> println(fs.exists(new Path("/data/temp_insight")))
> Output:
> 
> false
> while I have stopped the thread for 15 sec, I have checked my hdfs using 
> command
> 
> hdfs dfs -ls /data/temp_insight
> Output:
> 
> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes 
> where applicable
> -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
> /data/temp_insight/_SUCCESS
> -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
> To cross verify whether it is taking the path of hdfs or not I have added 
> one more println statement in my code, providing the path which is 
> already there in HDFS. It's showing true in that case.
> 
> So, what could be the reason?
> 
> Thanks,
> 
> Abhijeet Kumar
>>> 
> 


Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Abhijeet Kumar
I need to rename the file. I can write a separate program for this, I think.

Thanks,
Abhijeet Kumar 
> On 09-Jun-2018, at 1:10 PM, Jörn Franke  wrote:
> 
> That would be an anti pattern and would lead to bad software.
> Please don’t do it for the sake of the people that use your software.
> What do you exactly want to achieve with the information if the file exists 
> or not?
> 
> On 9. Jun 2018, at 08:34, Abhijeet Kumar  > wrote:
> 
>> Can you please tell the estimated time. So, that my program will wait for 
>> that time period.
>> 
>> Thanks,
>> Abhijeet Kumar
>>> On 09-Jun-2018, at 12:01 PM, Jörn Franke >> > wrote:
>>> 
>>> You need some time until the information of the file creation is propagated.
>>> 
>>> On 9. Jun 2018, at 08:07, Abhijeet Kumar >> > wrote:
>>> 
 I'm modifying a CSV file which is inside HDFS and finally putting it back 
 to HDFS in Spark.
 val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
 csv_file.coalesce(1).write
   .format("csv”)
   .mode("overwrite”)
   .save("hdfs://localhost:8020/data/temp_insight 
 ”)
 Thread.sleep(15000)
 println(fs.exists(new Path("/data/temp_insight")))
 Output:
 
 false
 while I have stopped the thread for 15 sec, I have checked my hdfs using 
 command
 
 hdfs dfs -ls /data/temp_insight
 Output:
 
 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
 /data/temp_insight/_SUCCESS
 -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
 /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
 To cross verify whether it is taking the path of hdfs or not I have added 
 one more println statement in my code, providing the path which is already 
 there in HDFS. It's showing true in that case.
 
 So, what could be the reason?
 
 Thanks,
 
 Abhijeet Kumar
>> 



Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Jörn Franke
That would be an anti pattern and would lead to bad software.
Please don’t do it for the sake of the people that use your software.
What do you exactly want to achieve with the information if the file exists or 
not?

> On 9. Jun 2018, at 08:34, Abhijeet Kumar  wrote:
> 
> Can you please tell the estimated time. So, that my program will wait for 
> that time period.
> 
> Thanks,
> Abhijeet Kumar
>> On 09-Jun-2018, at 12:01 PM, Jörn Franke  wrote:
>> 
>> You need some time until the information of the file creation is propagated.
>> 
>>> On 9. Jun 2018, at 08:07, Abhijeet Kumar  
>>> wrote:
>>> 
>>> I'm modifying a CSV file which is inside HDFS and finally putting it back 
>>> to HDFS in Spark.
>>> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
>>> csv_file.coalesce(1).write
>>>   .format("csv”)
>>>   .mode("overwrite”)
>>>   .save("hdfs://localhost:8020/data/temp_insight”)
>>> Thread.sleep(15000)
>>> println(fs.exists(new Path("/data/temp_insight")))
>>> Output:
>>> 
>>> false
>>> while I have stopped the thread for 15 sec, I have checked my hdfs using 
>>> command
>>> 
>>> hdfs dfs -ls /data/temp_insight
>>> Output:
>>> 
>>> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>>> library for your platform... using builtin-java classes where applicable
>>> -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
>>> /data/temp_insight/_SUCCESS
>>> -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
>>> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
>>> To cross verify whether it is taking the path of hdfs or not I have added 
>>> one more println statement in my code, providing the path which is already 
>>> there in HDFS. It's showing true in that case.
>>> 
>>> So, what could be the reason?
>>> 
>>> Thanks,
>>> 
>>> Abhijeet Kumar
> 


Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Abhijeet Kumar
Can you please tell the estimated time. So, that my program will wait for that 
time period.

Thanks,
Abhijeet Kumar
> On 09-Jun-2018, at 12:01 PM, Jörn Franke  wrote:
> 
> You need some time until the information of the file creation is propagated.
> 
> On 9. Jun 2018, at 08:07, Abhijeet Kumar  > wrote:
> 
>> I'm modifying a CSV file which is inside HDFS and finally putting it back to 
>> HDFS in Spark.
>> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
>> csv_file.coalesce(1).write
>>   .format("csv”)
>>   .mode("overwrite”)
>>   .save("hdfs://localhost:8020/data/temp_insight 
>> ”)
>> Thread.sleep(15000)
>> println(fs.exists(new Path("/data/temp_insight")))
>> Output:
>> 
>> false
>> while I have stopped the thread for 15 sec, I have checked my hdfs using 
>> command
>> 
>> hdfs dfs -ls /data/temp_insight
>> Output:
>> 
>> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
>> /data/temp_insight/_SUCCESS
>> -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
>> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
>> To cross verify whether it is taking the path of hdfs or not I have added 
>> one more println statement in my code, providing the path which is already 
>> there in HDFS. It's showing true in that case.
>> 
>> So, what could be the reason?
>> 
>> Thanks,
>> 
>> Abhijeet Kumar



Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Jörn Franke
You need some time until the information of the file creation is propagated.

> On 9. Jun 2018, at 08:07, Abhijeet Kumar  wrote:
> 
> I'm modifying a CSV file which is inside HDFS and finally putting it back to 
> HDFS in Spark.
> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
> csv_file.coalesce(1).write
>   .format("csv”)
>   .mode("overwrite”)
>   .save("hdfs://localhost:8020/data/temp_insight”)
> Thread.sleep(15000)
> println(fs.exists(new Path("/data/temp_insight")))
> Output:
> 
> false
> while I have stopped the thread for 15 sec, I have checked my hdfs using 
> command
> 
> hdfs dfs -ls /data/temp_insight
> Output:
> 
> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
> /data/temp_insight/_SUCCESS
> -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
> To cross verify whether it is taking the path of hdfs or not I have added one 
> more println statement in my code, providing the path which is already there 
> in HDFS. It's showing true in that case.
> 
> So, what could be the reason?
> 
> Thanks,
> 
> Abhijeet Kumar