[ https://issues.apache.org/jira/browse/SPARK-24047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-24047. ---------------------------------- Resolution: Invalid Sounds more like a question. Please ask it to the mailing list. You could have a better answer. Please reopen this if we find this is actually an issue. > use spark package to load csv file > ---------------------------------- > > Key: SPARK-24047 > URL: https://issues.apache.org/jira/browse/SPARK-24047 > Project: Spark > Issue Type: IT Help > Components: Input/Output > Affects Versions: 2.3.0 > Reporter: Jijiao Zeng > Priority: Major > > I am new to spark. I used spark.read.csv() function read local csv.file. > But I got following error: > > h2. File "<stdin>", line 1, in <module> > h2. File > "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/readwriter.py", > line 439, in csv > h2. return > self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) > h2. File > "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", > line 1160, in __call__ > h2. File > "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line > 63, in deco > h2. return f(*a, **kw) > h2. File > "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", > line 320, in get_return_value > h2. py4j.protocol.Py4JJavaError: An error occurred while calling o58.csv. > h2. : java.lang.AssertionError: assertion failed: Conflicting directory > structures detected. Suspicious paths: > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7 > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/streaming > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/pythonconverters > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/ml > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/hello > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/resources > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/streaming > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib/stat > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/als > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/sql > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark.egg-info > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/hello/sub_hello > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/licenses > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/parquet_partitioned > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/sbin > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/orc_partitioned > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/tests/testthat > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/profile > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql/hive > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/kubernetes/dockerfiles/spark > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/html > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql/streaming > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib/linalg > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/jars > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/worker > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/graphx > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/streaming > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images/multi-channel > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/streaming/clickstream > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/conf > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r/streaming > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/ml > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/sql/streaming > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/docs > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/ridge-data > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/help > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/ml > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/mllib > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r/ml > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/meta > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/r > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/mllib > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/streaming > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/bin > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/python/pyspark > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/mllib > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/yarn > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql/streaming > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/ml/param > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/ml > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/streaming > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/streaming > h2. > file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql/hive > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/jars > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images/kittens > h2. file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/graphx > h2. > h2. If provided paths are partition directories, please set "basePath" in the > options of the data source to specify the root directory of the table. If > there are multiple root directories, please load them separately and then > union them. > h2. at scala.Predef$.assert(Predef.scala:170) > h2. at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:133) > h2. at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:98) > h2. at > org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:153) > h2. at > org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:71) > h2. at > org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50) > h2. at > org.apache.spark.sql.execution.datasources.DataSource.combineInferredAndUserSpecifiedPartitionSchema(DataSource.scala:115) > h2. at > org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:166) > h2. at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392) > h2. at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) > h2. at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) > h2. at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:594) > h2. at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > h2. at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > h2. at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > h2. at java.base/java.lang.reflect.Method.invoke(Method.java:564) > h2. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > h2. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > h2. at py4j.Gateway.invoke(Gateway.java:282) > h2. at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > h2. at py4j.commands.CallCommand.execute(CallCommand.java:79) > h2. at py4j.GatewayConnection.run(GatewayConnection.java:214) > h2. at java.base/java.lang.Thread.run(Thread.java:844) > > Any suggestion will be appreciated. Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org