well I do not really need to do it while another job is editing them.
I just need to get the names of the folders when I read through
textFile("path/to/dir/*/*/*.js")
Using *native hadoop* libraries, can I do something like*
fs.copy("/my/path/*/*","new/path/")?*
Narek Galstyan
Նարեկ Գալստյան
This won't work as you can never guarantee which files were read by Spark
if some other process is writing files to the same location. It would be
far less work to move files matching your pattern to a staging location and
then load them using sc.textFile. you should find hdfs file system calls
Dear Spark users,
I am reading a set of json files to compile them to Parquet data format.
I am willing to mark the folders in some way after having read their
contents so that I do not read it again(e.g. I can changed the name of the
folder).
I use .textFile("path/to*/dir/*/*/*.js") *technique