Re: get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")

2015-10-27 Thread Նարեկ Գալստեան
well I do not really need to do it while another job is editing them. I just need to get the names of the folders when I read through textFile("path/to/dir/*/*/*.js") Using *native hadoop* libraries, can I do something like* fs.copy("/my/path/*/*","new/path/")?* Narek Galstyan Նարեկ Գալստյան

Re: get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")

2015-10-27 Thread Deenar Toraskar
This won't work as you can never guarantee which files were read by Spark if some other process is writing files to the same location. It would be far less work to move files matching your pattern to a staging location and then load them using sc.textFile. you should find hdfs file system calls

get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")

2015-10-27 Thread Նարեկ Գալստեան
Dear Spark users, I am reading a set of json files to compile them to Parquet data format. I am willing to mark the folders in some way after having read their contents so that I do not read it again(e.g. I can changed the name of the folder). I use .textFile("path/to*/dir/*/*/*.js") *technique