[ https://issues.apache.org/jira/browse/SPARK-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906723#comment-14906723 ]
Marcelo Vanzin commented on SPARK-10804: ---------------------------------------- This is really a Hive issue, which Spark just inherits since it calls the Hive code directly to handle that statement. > "LOCAL" in LOAD DATA LOCAL INPATH means "remote" > ------------------------------------------------ > > Key: SPARK-10804 > URL: https://issues.apache.org/jira/browse/SPARK-10804 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Antonio Piccolboni > > Connecting with a remote thriftserver with a custom JDBC client or beeline, > load data local inpath fails. Hiveserver2 docs explain in a quick comment > that local now means local to the server. I think this is just a > rationalization for a bug. When a user types "local" > # it needs to be local to him, not some server > # Failing 1., one needs to have a way to determine what local means and > create a "local" item under the new definition. > With the thirftserver, I have a host to connect to, but I don't have any way > to create a file local to that host, at least in spark. It may not be > desirable to create user directories on the thriftserver host or running file > transfer services like scp. Moreover, it appears that this syntax is unique > to Hive and Spark but its origin can be traced to LOAD DATA LOCAL INFILE in > Oracle and was adopted by mysql. In the latter docs we can read "If LOCAL is > specified, the file is read by the client program on the client host and sent > to the server. The file can be given as a full path name to specify its exact > location. If given as a relative path name, the name is interpreted relative > to the directory in which the client program was started". This is not to say > that the spark or hive teams are bound to what Oracle and Mysql do, but to > support the idea that the meaning of LOCAL is settled. For instance, the > Impala documentation says: "Currently, the Impala LOAD DATA statement only > imports files from HDFS, not from the local filesystem. It does not support > the LOCAL keyword of the Hive LOAD DATA statement." I think this is a better > solution. The way things are in thriftserver, I developed a client under the > assumption that I could use LOAD DATA LOCAL INPATH and all tests where > passing in standalone mode, only to find with the first distributed test that > # LOCAL means "local to server", a.k.a. "remote" > # INSERT INTO ... VALUES is not supported > # There is really no workaround unless one assumes access what data store > spark is running against , like HDFS, and that the user can upload data to > it. > In the space of workarounds it is not terrible, but if you are trying to > write a self-contained spark package, that's a defeat and makes writing tests > particularly hard. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org