date:20240424

DataFrameReader: timestampFormat default value

2024-04-24 Thread keen

Is anyone familiar with [Datetime patterns]( https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html) and `TimestampType` parsing in PySpark? When reading CSV or JSON files, timestamp columns need to be parsed. via datasource property `timestampFormat`. [According to documentation](

Re: [spark-graphframes]: Generating incorrect edges

2024-04-24 Thread Mich Talebzadeh

OK let us have a look at these 1) You are using monotonically_increasing_id(), which is not collision-resistant in distributed environments like Spark. Multiple hosts can generate the same ID. I suggest switching to UUIDs (e.g., uuid.uuid4()) for guaranteed uniqueness. 2) Missing values in

Re: [spark-graphframes]: Generating incorrect edges

2024-04-24 Thread Nijland, J.G.W. (Jelle, Student M-CS)

Hi Mich, Thanks for your reply, 1) ID generation is done using monotonically_increasing_id() this is then prefixed with "p_", "m_", "o_" or "org_" depending on the

Re: [spark-graphframes]: Generating incorrect edges

2024-04-24 Thread Mich Talebzadeh

OK few observations 1) ID Generation Method: How are you generating unique IDs (UUIDs, sequential numbers, etc.)? 2) Data Inconsistencies: Have you checked for missing values impacting ID generation? 3) Join Verification: If relevant, can you share the code for joining data points during ID

RE: How to add MaxDOP option in spark mssql JDBC

2024-04-24 Thread Appel, Kevin

You might be able to leverage the prepareQuery option, that is at https://spark.apache.org/docs/3.5.1/sql-data-sources-jdbc.html#data-source-option ... this was introduced in Spark 3.4.0 to handle temp table query and CTE query against MSSQL server since what you send in is not actually what

[spark-graphframes]: Generating incorrect edges

2024-04-24 Thread Nijland, J.G.W. (Jelle, Student M-CS)

tags: pyspark,spark-graphframes Hello, I am running pyspark in a podman container and I have issues with incorrect edges when I build my graph. I start with loading a source dataframe from a parquet directory on my server. The source dataframe has the following columns:

DataFrameReader: timestampFormat default value

Re: [spark-graphframes]: Generating incorrect edges

Re: [spark-graphframes]: Generating incorrect edges

Re: [spark-graphframes]: Generating incorrect edges

RE: How to add MaxDOP option in spark mssql JDBC

[spark-graphframes]: Generating incorrect edges

6 matches

Site Navigation

Mail list logo

Footer information