robertnagy1 opened a new issue, #1383:
URL: https://github.com/apache/sedona/issues/1383
## Expected behavior
Running df =
sedona.read.format("geoparquet").load("/lakehouse/default/Files/samples/parquet/buildings.parquet")
should return a spark datafra.e
## Actual behavior
Returns an error
Py4JJavaError Traceback (most recent call last)
Cell In[16], line 1
----> 1
sedona.read.format("geoparquet").load("/lakehouse/default/Files/samples/parquet/buildings.parquet")
File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:300, in
DataFrameReader.load(self, path, format, schema, **options)
298 self.options(**options)
299 if isinstance(path, str):
--> 300 return self._df(self._jreader.load(path))
301 elif path is not None:
302 if type(path) != list:
File
~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1322,
in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File
/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in
capture_sql_exception.<locals>.deco(*a, **kw)
167 def deco(*a: Any, **kw: Any) -> Any:
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
171 converted = convert_exception(e.java_exception)
File
~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py:326, in
get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling o6230.load.
: Operation failed: "Bad Request", 400, HEAD,
http://onelake.dfs.fabric.microsoft.com/§redacted§/lakehouse/default/Files/samples/parquet/buildings.parquet?upn=false&action=getStatus&timeout=90
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:231)
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:191)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:189)
at
org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:690)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:1053)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:650)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:640)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.exists(AzureBlobFileSystem.java:1236)
at
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:757)
at
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:755)
at
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:393)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at
java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
at
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
## Steps to reproduce the problem
Add a parquet file to the default Lakehouse for a workspace. Try to read it
from the path.
Some thing I noticed: Fabric mounts in the Lakehouse can be reached using
the python os package but not with the notebookutils.mssparkutils python
package. So mssparkutils will return the dfs location with the uuid, rather
than the alias for mounted path. Might be something that is interesting.
## Settings
Sedona version = 1.5.1
Apache Spark version = 3.4
Apache Flink version = Not applicable
API type = Python
Scala version = 2.12
JRE version = Cluster default
Python version = Cluster default
Environment = Microsoft Fabric
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]