hi
i want to set up a tpcds benchmark to test some performance of some spark
feature
i saw in TPCDSQueryBenchmark , it need send the --data-location to the
class, my question is how to generate the tpcds data in this benchmark
```
/**
* Benchmark to measure TPCDS query performance.
*
hi all
i have a query
```
1 spark.sql("select
distinct cust_id,
cast (b.device_name as varchar(200)) as devc_name_cast,
prmry_reside_cntry_code
from (select * from ${model_db}.crs_recent_30d_SF_dim_cust_info where
dt='${today}') a
join fact_rsk_magnes_txn b on
PARK-27638.
You can set spark.sql.legacy.typeCoercion.datetimeToString.enabled to true to
restore the old behavior.
On Mon, Mar 6, 2023 at 10:27 AM zhangliyun wrote:
Hi all
i have a spark sql , before in spark 2.4.2 it runs correctly, when i
upgrade to spark 3.1.3, it has some problem.
Hi all
i have a spark sql , before in spark 2.4.2 it runs correctly, when i
upgrade to spark 3.1.3, it has some problem.
the sql
```
select * from eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly where
dt >= date_sub('${today}',30);
```
it will load the data of past
Hi all:
i want to ask a question about the metrics to show the executor is fully used
the memory. in the log i always saw following in the log, i guess this means
i did not fully use the executor 's memory.
but i don't want to open the log to view, is there any metrics to show it?
my
t; org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
> jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
>
>
>Cheers,
>-z
>
>
>From: zhangliyun
>Sent: Monday, May 11, 2020 9:44
Hi all:
i want to ask a question, it seems that my spark job hangs as 20+ hours.
in the spark history log, it shows 8999 completed task while 2 not finished.
but when i go to the tasks page. i did not find any running tasks. All tasks
are failed or Successful. I guess it seems that all
Hi all:
i want to ask a question, it seems that my spark job hangs as 20+ hours.
in the spark history log, it shows 8999 completed task while 2 not finished.
but when i go to the tasks page. i did not find any running tasks. All tasks
are failed or Successful. I guess it seems that all
kPlanGraph` has a `makeDotFile` method where you can write out a `.dot`
file and visualize it with Graphviz tools, e.g. http://www.webgraphviz.com/
Thanks,
Manu
On Thu, Apr 30, 2020 at 3:21 PM zhangliyun wrote:
Hi all
i want to ask a question is there any tool to visualize the spark phys
main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L306.
`SparkPlanGraph` has a `makeDotFile` method where you can write out a `.dot`
file and visualize it with Graphviz tools, e.g. http://www.webgraphviz.com/
Thanks,
Manu
On Thu, Apr 30, 2020 at 3:21 PM zhangliyun wrot
Hi all
i want to ask a question is there any tool to visualize the spark physical
plan or spark plan? sometimes the physical plan is very long so it is difficult
to view it.
Best Regards
KellyZhang
Hi all:
i want to ask a question about how to estimate the rdd size( according to
byte) when it is not saved to disk because the job spends long time if the
output is very huge and output partition number is small.
following step is what i can solve for this problem
1.sample 0.01 's
转发邮件信息
发件人:"zhangliyun"
发送日期:2019-12-03 05:56:55
收件人:"Wenchen Fan"
主题:Re:Re: A question about radd bytes size
Hi Fan:
thanks for reply, I agree that the how the data is stored decides the total
bytes of the table file.
In my
Hi:
I want to get the total bytes of a DataFrame by following function , but when
I insert the DataFrame into hive , I found the value of the function is
different from spark.sql.statistics.totalSize . The
spark.sql.statistics.totalSize is less than the result of following function
Hi all:
i saw skewed join hint optimization in
https://docs.azuredatabricks.net/delta/join-performance/skew-join.html.
it is a great feature to help users to avoid the problem brought from skewed
data. My question
1. which version we will have this ? i have not found the feature in the
y generated by using a NOT IN (subquery), if you are OK
with slightly different NULL semantics then you could use NOT EXISTS(subquery).
The latter should perform a lot better.
On Wed, Oct 23, 2019 at 12:02 PM zhangliyun wrote:
Hi all:
i want to ask a question about broadcast nestloop join
OM happens.
Maybe there is an algorithm to implement left/right join in a distributed
environment without broadcast, but currently Spark is only able to deal with it
using broadcast.
On Wed, Oct 23, 2019 at 6:02 PM zhangliyun wrote:
Hi all:
i want to ask a question about broadcast nestloop
Hi all:
i want to ask a question about broadcast nestloop join? from google i know,
that
left outer/semi join and right outer/semi join will use broadcast nestloop.
and in some cases, when the input data is very small, it is suitable to use.
so here
how to define the input data very
Hi all:
i used in subquery like following in spark 2.3.1
{code}
set spark.sql.autoBroadcastJoinThreshold=-1;
explain select * from testdata where key1 not in (select key1 from testdata as
b);
= Physical Plan ==
BroadcastNestedLoopJoin BuildRight, LeftAnti, ((key1#60 = key1#62) ||
Johann, Uwe Reimann
Am 23.08.2019 um 09:43 schrieb zhangliyun :
Hi all:
when i use spark dynamic partition feature , i met a problem about hdfs
quota. I found that it is every easy to meet quota problem (exceed the max
value of quota of directory)
I have generated a unpart
Hi all:
when i use spark dynamic partition feature , i met a problem about hdfs
quota. I found that it is every easy to meet quota problem (exceed the max
value of quota of directory)
I have generated a unpartitioned table 'bsl12.email_edge_lyh_mth1' which
contains 584M records and will
Hi All:
i have a question about repartition api and sparksql partition. I have an
table which partition key is day
```
./bin/spark-sql -e "CREATE TABLE t_original_partitioned_spark (cust_id int,
loss double) PARTITIONED BY (day STRING) location
l
On 29 July 2019 at 07:12:30, zhangliyun (kelly...@126.com) wrote:
Hi all:
i want to ask a question about broadcast join in spark sql.
```
select A.*,B.nsf_cards_ratio * 1.00 / A.nsf_on_entry as nsf_ratio_to_pop
from B
left join A
on trim(A.country) = trim(B.cntry_code);
```
here A
Hi all:
i want to ask a question about broadcast join in spark sql.
```
select A.*,B.nsf_cards_ratio * 1.00 / A.nsf_on_entry as nsf_ratio_to_pop
from B
left join A
on trim(A.country) = trim(B.cntry_code);
```
here A is a small table only 8 rows, but somehow the statistics of table A
rg
Date:2019-03-26 14:34:20
Subject:RE: How to build single jar for single project in spark
You can try this
https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually
Thanks,
Gerry
From: zhangliyun
Sent: 2019年3月26日 16:50
To: dev@spark.apache.org
25 matches
Mail list logo