Following link you will get all required details
https://aws.amazon.com/blogs/containers/best-practices-for-running-spark-on-amazon-eks/
Let me know if you required further informations.
Regards,
Vaquar khan
On Mon, May 15, 2023, 10:14 PM Mich Talebzadeh
wrote:
> Couple of points
>
I saw you are looking holden video .please find following link.
https://www.oreilly.com/library/view/debugging-apache-spark/9781492039174/
Regards,
Vaquar khan
On Sun, Mar 12, 2023, 6:56 PM Mich Talebzadeh
wrote:
> Hi Denny,
>
> Thanks for the offer. How do you envisage that
@ Gourav Sengupta why you are sending unnecessary emails ,if you think
snowflake good plz use it ,here question was different and you are talking
totally different topic.
Plz respects group guidelines
Regards,
Vaquar khan
On Wed, Dec 28, 2022, 10:29 AM vaquar khan wrote:
> Here you can f
Here you can find all details , you just need to pass spark dataframe and
deequ also generate recommendations for rules and you can also write custom
complex rules.
https://aws.amazon.com/blogs/big-data/test-data-quality-at-scale-with-deequ/
Regards,
Vaquar khan
On Wed, Dec 28, 2022, 9:40 AM
I would suggest Deequ , I have implemented many time easy and effective.
Regards,
Vaquar khan
On Tue, Dec 27, 2022, 10:30 PM ayan guha wrote:
> The way I would approach is to evaluate GE, Deequ (there is a python
> binding called pydeequ) and others like Delta Live tables with expect
eaucoup mes amis :)
>
> [1] https://stackoverflow.com/q/66933229/1305344
>
> Pozdrawiam,
> Jacek Laskowski
>
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
HI Pedro,
What is your usecase ,why you used coqlesce ,coalesce() is very expensive
operations as they shuffle the data across many partitions hence try to
minimize repartition as much as possible.
Regards,
Vaquar khan
On Thu, Mar 18, 2021, 5:47 PM Pedro Tuero wrote:
> I was review
Hi Yang,
Please find following link
https://stackoverflow.com/questions/63677736/spark-application-as-a-rest-service/63678337#63678337
Regards,
Vaquar khan
On Wed, Nov 25, 2020 at 12:40 AM Sonal Goyal wrote:
> You should be able to supply the --conf and its values as part of appA
Hi Swetha,
It would be great if you ask same question in stackoverflow , we have very
active community and monitor stack for each spark questions.
If you ask same question via stack other ppl also get benefits for similar
problems.
Regards,
Vaquar khan
On Sun, Sep 29, 2019, 10:26 PM swetha
Hi Deepak,
You can use textFileStream.
https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
Plz start using stackoverflow to ask question to other ppl so get benefits
of answer
Regards,
Vaquar khan
On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma wrote:
> I am using sp
Sure let me check Jira
Regards,
Vaquar khan
On Thu, Jun 21, 2018, 4:42 PM Takeshi Yamamuro
wrote:
> In this ticket SPARK-24201, the ambiguous statement in the doc had been
> pointed out.
> can you make pr for that?
>
> On Fri, Jun 22, 2018 at 6:17 AM, vaquar khan
>
sion (2.11.x).
Regards,
Vaquar khan
On Thu, Jun 21, 2018 at 11:56 AM, chriswakare <
chris.newski...@intellibridge.co> wrote:
> Hi Rahul,
> This will work only in Java 8.
> Installation does not work with both version 9 and 10
>
> Thanks,
> Christopher
>
>
>
>
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html
Regards,
Vaquar khan
On Wed, Jun 20, 2018, 1:18 AM Aakash Basu
wrote:
> Hi guys,
>
> I just wanted to know, why my ParallelGC (*--conf
> "spark.executor.extraJavaOptions=-
Why you need tool,you can directly connect Hbase using spark.
Regards,
Vaquar khan
On Jun 18, 2018 4:37 PM, "Lian Jiang" wrote:
Hi,
I am considering tools to load hbase data using spark. One choice is
https://github.com/Huawei-Spark/Spark-SQL-on-HBase. However, this seems to
be o
persist or any other logical separation in pipeline.
Regards,
Vaquar khan
On Sun, Jun 17, 2018 at 5:25 AM, Eyal Zituny wrote:
> Hi Akash,
> such errors might appear in large spark pipelines, the root cause is a
> 64kb jvm limitation.
> the reason that your job isn't failing at th
Hi Akash,
Please check stackoverflow.
https://stackoverflow.com/questions/41098953/codegen-grows-beyond-64-kb-error-when-normalizing-large-pyspark-dataframe
Regards,
Vaquar khan
On Sat, Jun 16, 2018 at 3:27 PM, Aakash Basu
wrote:
> Hi guys,
>
> I'm getting an error when I'
Plz check ur Java Home path .
May be spacial char or space on ur path.
Regards,
Vaquar khan
On Sat, Jun 16, 2018, 1:36 PM Raymond Xie wrote:
> I am trying to run spark-shell in Windows but receive error of:
>
> \Java\jre1.8.0_151\bin\java was unexpected at this time.
>
&
of records will be big delay in
response.
Regards,
Vaquar khan
On Mon, Jun 11, 2018, 2:59 AM Teemu Heikkilä wrote:
> So you are now providing the data on-demand through spark?
>
> I suggest you change your API to query from cassandra and store the
> results from Spark back there,
https://stackoverflow.com/questions/26562033/how-to-set-apache-spark-executor-memory
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 6:22 PM, Alec Swan <alecs...@gmail.com> wrote:
> Hello,
>
> I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
> format. Ef
Confirmed ,you can use Accumulators :)
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 10:58 AM, Kedarnath Dixit <
kedarnath_di...@persistent.com> wrote:
> Hi,
>
>
> We need some way to toggle the flag of a variable in transformation.
>
>
> We are thinking to make
as an argument of textFile the path
of the file in the worker filesystem.
Regards,
Vaquar khan
On Fri, Sep 29, 2017 at 2:00 PM, JG Perrin <jper...@lumeris.com> wrote:
> On a test system, you can also use something like
> Owncloud/Nextcloud/Dropbox to insure that the files are synchro
http://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide
Regards,
Vaquar khan
On Fri, Sep 22, 2017 at 4:41 PM, Gokula Krishnan D <email2...@gmail.com>
wrote:
> Thanks for the reply. Forgot to mention that, our Batch ETL Jobs are in
> Core-Spark.
>
>
>
entered into maintenance mode.
Regards,
Vaquar khan
On Sat, Sep 23, 2017 at 4:04 PM, Koert Kuipers <ko...@tresata.com> wrote:
> our main challenge has been the lack of support for missing values
> generally
>
> On Sat, Sep 23, 2017 at 3:41 AM, Irfan Kabli <irfan.kabli.
RIVER_MEMORY, "2g")
.launch();
spark.waitFor();
}
}
*Note :*
a user application is launched using the bin/spark-submit script. This
script takes care of setting up the classpath with Spark and its
dependencies, and can support different cluster managers and deploy mo
://ampcamp.berkeley.edu/6/exercises/time-series-tutorial-taxis.html
Regards,
Vaquar khan
On Wed, Aug 30, 2017 at 1:21 PM, Irving Duran <irving.du...@gmail.com>
wrote:
> I think it will work. Might want to explore spark streams.
>
>
> Thank You,
>
> Irving Duran
>
> On Wed, Au
Following error we are getting because of dependency mismatch.
Regards,
vaquar khan
On Jul 17, 2017 3:50 AM, "zzcclp" <441586...@qq.com> wrote:
Hi guys:
I am using spark 2.1.1 to test on CDH 5.7.1, when i run on yarn with
following command, error 'N
dashboards. In fact, you can apply Spark’s machine learning
<https://spark.apache.org/docs/latest/ml-guide.html> and graph processing
<https://spark.apache.org/docs/latest/graphx-programming-guide.html> algorithms
on data streams.
Regards,
Vaquar khan
On Sun, Jun 11, 2017 at 3:12 AM,
for memory growth). A simple check
that the file can be read would be:
sc.textFile(file, numPartitions).count()
You can get good explanation here :
https://stackoverflow.com/questions/29011574/how-does-
partitioning-work-for-data-from-files-on-hdfs
Regards,
Vaquar khan
On Jun 11, 2017 5:28 AM
Avoid groupby and use reducebykey.
Regards,
Vaquar khan
On Jun 4, 2017 8:32 AM, "Guy Cohen" <g...@gettaxi.com> wrote:
> Try this one:
>
> df.groupBy(
> when(expr("field1='foo'"),"field1").when(expr("field2='bar'"),"field2&quo
://spark.apache.org/docs/1.1.0/submitting-applications.html
Also try to avoid function need memory like collect etc.
Regards,
Vaquar khan
On Jun 4, 2017 5:46 AM, "Abdulfattah Safa" <fattah.s...@gmail.com> wrote:
I'm working on Spark with Standalone Cluster mode. I need to increase t
Hi ,
Pleaae check your firewall security setting sharing link one good link.
http://belablotski.blogspot.in/2016/01/access-hive-tables-from-spark-using.html?m=1
Regards,
Vaquar khan
On Jun 8, 2017 1:53 AM, "Patrik Medvedev" <patrik.medve...@gmail.com> wrote:
> Hello guy
It's depends on programming style ,I would like to say setup few rules to
avoid complex code in scala , if needed ask programmer to add proper
comments.
Regards,
Vaquar khan
On Jun 8, 2017 4:17 AM, "JB Data" <jbdat...@gmail.com> wrote:
> Java is Object langage borned to D
Hi Ayan,
If you have multiple files (example 12 files )and you are using following
code then you will get 12 partition.
r = sc.textFile("file://my/file/*")
Not sure what you want to know about file system ,please check API doc.
Regards,
Vaquar khan
On Jun 8, 2017 10:44 AM,
You can add filter or replace null with value like 0 or string.
df.na.fill(0, Seq("y"))
Regards,
Vaquar khan
On Jun 2, 2017 11:25 AM, "Alonso Isidoro Roman" <alons...@gmail.com> wrote:
not sure if this can help you, but you can infer programmatically the
schema pr
HI ,
I found following two links are helpful sharing with you .
http://stackoverflow.com/questions/38353524/how-to-ensure-partitioning-induced-by-spark-dataframe-join
http://spark.apache.org/docs/latest/configuration.html
Regards,
Vaquar khan
On Wed, Mar 29, 2017 at 2:45 PM, Vidya Sujeet
Please read Spark documents at least once before asking question.
http://spark.apache.org/docs/latest/streaming-programming-guide.html
http://2s7gjr373w3x22jf92z99mgm5w-wpengine.netdna-ssl.com/wp-content/uploads/2015/11/spark-streaming-datanami.png
Regards,
Vaquar khan
On Fri, Mar 10, 2017
/content/troubleshooting/javaionotserializableexception.html
Regards,
Vaquar khan
On Fri, Feb 17, 2017 at 9:36 PM, Darshan Pandya <darshanpan...@gmail.com>
wrote:
> Hello,
>
> I am getting the famous serialization exception on running some code as
> below,
>
> val
Did you try MSCK REPAIR TABLE ?
Regards,
Vaquar Khan
On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com>
wrote:
> I dont think so, i was able to insert overwrite other created tables in
> hive using spark sql. The only problem I am facing
Hi Ashmath,
Try refresh table
// spark is an existing SparkSession
spark.catalog.refreshTable("my_table")
http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing
Regards,
Vaquar khan
On Sun, Feb 5, 2017 at 7:19 PM, KhajaAsmath Mohammed &l
https://databricks.gitbooks.io/databricks-spark-reference-applications/content/timeseries/index.html
Regards,
Vaquar khan
On Wed, Jan 11, 2017 at 10:07 AM, Dirceu Semighini Filho <
dirceu.semigh...@gmail.com> wrote:
> Hello Rishabh,
> We have done some forecasting, for time-series,
Hi Deepak,
Could you share Index information in your database.
select * from indexInfo;
Regards,
Vaquar khan
On Sat, Dec 17, 2016 at 2:45 PM, Holden Karau <hol...@pigscanfly.ca> wrote:
> How many workers are in the cluster?
>
> On Sat, Dec 17, 2016 at 12:23 PM Deepak S
Hi Kant,
Hope following information will help .
1)Cluster
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-standalone.html
http://spark.apache.org/docs/latest/hardware-provisioning.html
2) Yarn vs Mesos
https://www.linkedin.com/pulse/mesos-compare-yarn-vaquar-
khan
That kind of issue SparkUI and DAG visualization always helpful.
https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html
Regards,
Vaquar khan
On Fri, Dec 16, 2016 at 11:10 AM, Vikas K. <vikas.re...@gmail.com> wrote:
> Unsubscribe.
&
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-rdd-partitions.html
Regards,
vaquar khan
On Wed, Dec 14, 2016 at 12:15 PM, Vaibhav Sinha <mail.vsi...@gmail.com>
wrote:
> Hi,
> I see a similar behaviour in an exactly similar scenario at my deployment
> as w
Not sure about your logic 0 and 1 but you can use orderBy the data
according to time and get the first value.
Regards,
Vaquar khan
On Wed, Dec 14, 2016 at 10:49 PM, Milin korath <milin.kor...@impelsys.com>
wrote:
> Hi
>
> I have a spark data frame with following structure
>
Hi Neeraj,
As per my understanding Spark SQL doesn't support Update statements .
Why you need update command in Spark SQL, You can run command in Hive .
Regards,
Vaquar khan
On Mon, Dec 12, 2016 at 10:21 PM, Niraj Kumar <nku...@incedoinc.com> wrote:
> Hi
>
>
>
> I am work
I found following links are good as I am using same.
http://spark.apache.org/docs/latest/tuning.html
https://spark-summit.org/2014/testing-spark-best-practices/
Regards,
Vaquar khan
On 8 Aug 2016 10:11, "Deepak Sharma" <deepakmc...@gmail.com> wrote:
> Hi All,
> Can
Hi Asfanyar,
*NoSuchMethodError *in Java means you compiled against one version of code
, and executed against a different version.
Please make sure your java version and adding dependency version is working
on same java version.
regards,
vaquar khan
On Fri, Jun 10, 2016 at 4:50 AM, Asfandyar
n “start-all.sh”, the Worker IP
>> address become 127.0.0.1, and then I tried “ifconfig l0 down” and the
>> Worker IP address become 127.0.1.1.
>>
>> What should I do to make IP use the IP address of the Ethernet instead of
>> the address of the wireless?
>>
>> Thanks
>>
>> Jay
>>
>>
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>>
>
>
--
Regards,
Vaquar Khan
+91 830-851-1500
Hi Sharad.
The array size you (or the serializer) tries to allocate is just too big
for the JVM.
You can also split your input further by increasing parallelism.
Following is good explanintion
https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit
regards,
Vaquar khan
/client/src/main/resources/spark-action-0.1.xsd
regards,
Vaquar Khan
On Wed, Jun 8, 2016 at 5:26 AM, karthi keyan <karthi93.san...@gmail.com>
wrote:
> Hi ,
>
> Make sure you have oozie 4.2.0 and configured with either yarn / mesos
> mode.
>
> Well, you just parse your scala
and Spark Streaming or do an
incremental select to make sure your Spark SQL tables stay up to date with
your production databases
Regards,
Vaquar khan
On 7 Jun 2016 10:29, "Deepak Sharma" <deepakmc...@gmail.com> wrote:
I am not sure if Spark provides any support for incremental ext
Hi Abhishek,
Please learn spark ,there are no shortcuts for sucess.
Regards,
Vaquar khan
On 29 Jul 2015 11:32, Mishra, Abhishek abhishek.mis...@xerox.com wrote:
Hello,
Please help me with links or some document for Apache Spark interview
questions and answers. Also for the tools related
My choice is java 8
On 15 Jul 2015 18:03, Alan Burlison alan.burli...@oracle.com wrote:
On 15/07/2015 08:31, Ignacio Blasco wrote:
The main advantage of using scala vs java 8 is being able to use a console
https://bugs.openjdk.java.net/browse/JDK-8043364
--
Alan Burlison
--
I would suggest study spark ,flink,strom and based on your understanding
and finding prepare your research paper.
May be you will invented new spark ☺
Regards,
Vaquar khan
On 16 Jul 2015 00:47, Michael Segel msegel_had...@hotmail.com wrote:
Silly question…
When thinking about a PhD thesis
Totally agreed with hafasa, you need to identify your requirements and
needs before choose spark.
If you want to handle data with fast access go to no sql (mongo,aerospike
etc) if you need data analytical then spark is best .
Regards,
Vaquar khan
On 14 Jul 2015 20:39, Hafsa Asif hafsa.a
I am using SBT
On 26 Jan 2015 15:54, Luke Wilson-Mawer lukewilsonma...@gmail.com wrote:
I use this: http://scala-ide.org/
I also use Maven with this archetype:
https://github.com/davidB/scala-archetype-simple. To be frank though, you
should be fine using SBT.
On Sat, Jan 24, 2015 at 6:33
57 matches
Mail list logo