wot no toggle ?
https://spark.apache.org/docs/3.0.0-preview/web-ui.html#storage-tab On the link in one of the screen shot there are two checkboxes.ON HEAP MEMORYOFF HEAP MEMORY. That is as useful as a pussy on as Barry Humphries wearing a gold dress as Dame Edna average. Which monkey came up with that ?None of the moneys here noticed that ? Ever heard of a toggle switch.look behind you. Look at the light switch.That is a Toggle switch ON/OFF. Jane thorpe janethor...@aol.com
Re:RE: Going it alone.
F*U*C*K O*F*F C*U*N*T*S On Thursday, 16 April 2020 Kelvin Qin wrote: No wonder I said why I can't understand what the mail expresses, it feels like a joke…… 在 2020-04-16 02:28:49,seemanto.ba...@nomura.com.INVALID 写道: Have we been tricked by a bot ? From: Matt Smith Sent: Wednesday, April 15, 2020 2:23 PM To: jane thorpe Cc: dh.lo...@gmail.com; user@spark.apache.org; janethor...@aol.com; em...@yeikel.com Subject: Re: Going it alone. | CAUTION EXTERNAL EMAIL: DO NOT CLICK ON LINKS OR OPEN ATTACHMENTS THAT ARE UNEXPECTED OR SENT FROM UNKNOWN SENDERS. IF IN DOUBT REPORT TO SPAM SUBMISSIONS. | This is so entertaining. 1. Ask for help 2. Compare those you need help from to a lower order primate. 3. Claim you provided information you did not 4. Explain that providing any information would be "too revealing" 5. ??? Can't wait to hear what comes next, but please keep it up. This is a bright spot in my day. On Tue, Apr 14, 2020 at 4:47 PM jane thorpe wrote: I did write a long email in response to you. But then I deleted it because I felt it would be too revealing. On Tuesday, 14 April 2020 David Hesson wrote: I want to know if Spark is headed in my direction. You are implying Spark could be. What direction are you headed in, exactly? I don't feel as if anything were implied when you were asked for use cases or what problem you are solving. You were asked to identify some use cases, of which you don't appear to have any. On Tue, Apr 14, 2020 at 4:49 PM jane thorpe wrote: That's what I want to know, Use Cases. I am looking for direction as I described and I want to know if Spark is headed in my direction. You are implying Spark could be. So tell me about the USE CASES and I'll do the rest. On Tuesday, 14 April 2020 yeikel valdes wrote: It depends on your use case. What are you trying to solve? On Tue, 14 Apr 2020 15:36:50 -0400 janethor...@aol.com.INVALIDwrote Hi, I consider myself to be quite good in Software Development especially using frameworks. I like to get my hands dirty. I have spent the last few months understanding modern frameworks and architectures. I am looking to invest my energy in a product where I don't have to relying on the monkeys which occupy this space we call software development. I have found one that meets my requirements. Would Apache Spark be a good Tool for me or do I need to be a member of a team to develop products using Apache Spark ? PLEASE READ: This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No confidentiality or privilege is waived or lost by any mistransmission. If you receive this message in error, please delete it and all copies from your system, destroy any hard copies and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Nomura Holding America Inc., Nomura Securities International, Inc, and their respective subsidiaries each reserve the right to monitor all e-mail communications through its networks. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state the views of such entity. Unless otherwise stated, any pricing information in this message is indicative only, is subject to change and does not constitute an offer to deal at any price quoted. Any reference to the terms of executed transactions should be treated as preliminary only and subject to our formal written confirmation.
Re: [Pyspark] - Spark uses all available memory; unrelated to size of dataframe
The Web UI only shows " The Storage Memory column shows the amount of memory used and reserved for caching data. " WEB UI does not show the values of Xmx or Xms or XSS. you are are never going to know the cause of OutofMemoryError or StackOverFlowError. The visual tool is as useless as it can possibly be. On Thursday, 16 April 2020 Yeikel wrote: The memory that you see in Spark's UI page, under storage is not the memory used by your processing but the amount of memory that you persisted from your RDDs and DataFrames Read more here : https://spark.apache.org/docs/3.0.0-preview/web-ui.html#storage-tab We need more details to be able to help you (sample code helps) -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Going it alone.
I did write a long email in response to you. But then I deleted it because I felt it would be too revealing. On Tuesday, 14 April 2020 David Hesson wrote: I want to know if Spark is headed in my direction. You are implying Spark could be. What direction are you headed in, exactly? I don't feel as if anything were implied when you were asked for use cases or what problem you are solving. You were asked to identify some use cases, of which you don't appear to have any. On Tue, Apr 14, 2020 at 4:49 PM jane thorpe wrote: That's what I want to know, Use Cases. I am looking for direction as I described and I want to know if Spark is headed in my direction. You are implying Spark could be. So tell me about the USE CASES and I'll do the rest. On Tuesday, 14 April 2020 yeikel valdes wrote: It depends on your use case. What are you trying to solve? On Tue, 14 Apr 2020 15:36:50 -0400 janethor...@aol.com.INVALID wrote Hi, I consider myself to be quite good in Software Development especially using frameworks. I like to get my hands dirty. I have spent the last few months understanding modern frameworks and architectures. I am looking to invest my energy in a product where I don't have to relying on the monkeys which occupy this space we call software development. I have found one that meets my requirements. Would Apache Spark be a good Tool for me or do I need to be a member of a team to develop products using Apache Spark ?
Re: Going it alone.
That's what I want to know, Use Cases. I am looking for direction as I described and I want to know if Spark is headed in my direction. You are implying Spark could be. So tell me about the USE CASES and I'll do the rest. On Tuesday, 14 April 2020 yeikel valdes wrote: It depends on your use case. What are you trying to solve? On Tue, 14 Apr 2020 15:36:50 -0400 janethor...@aol.com.INVALID wrote Hi, I consider myself to be quite good in Software Development especially using frameworks. I like to get my hands dirty. I have spent the last few months understanding modern frameworks and architectures. I am looking to invest my energy in a product where I don't have to relying on the monkeys which occupy this space we call software development. I have found one that meets my requirements. Would Apache Spark be a good Tool for me or do I need to be a member of a team to develop products using Apache Spark ?
Going it alone.
Hi, I consider myself to be quite good in Software Development especially using frameworks. I like to get my hands dirty. I have spent the last few months understanding modern frameworks and architectures. I am looking to invest my energy in a product where I don't have to relying on the monkeys which occupy this space we call software development. I have found one that meets my requirements. Would Apache Spark be a good Tool for me or do I need to be a member of a team to develop products using Apache Spark ?
Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting
Here a is another tool I use Logic Analyser 7:55 https://youtu.be/LnzuMJLZRdU you could take some suggestions for improving performance queries. https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1 Jane thorpe janethor...@aol.com -Original Message- From: jane thorpe To: janethorpe1 ; mich.talebzadeh ; liruijing09 ; user Sent: Mon, 13 Apr 2020 8:32 Subject: Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting This tool may be useful for you to trouble shoot your problems away. https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html "APM tools typically use a waterfall-type view to show the blocking time of different components cascading through the control flow within an application. These types of visualizations are useful, and AppOptics has them, but they can be difficult to understand for those of us without a PhD." Especially helpful if you want to understand through visualisation and you do not have a phD. Jane thorpe janethor...@aol.com -Original Message----- From: jane thorpe To: mich.talebzadeh ; liruijing09 ; user CC: user Sent: Sun, 12 Apr 2020 4:35 Subject: Re: Spark hangs while reading from jdbc - does nothing You seem to be implying the error is intermittent. You seem to be implying data is being ingested via JDBC. So the connection has proven itself to be working unless no data is arriving from the JDBC channel at all. If no data is arriving then one could say it could be the JDBC.If the error is intermittent then it is likely a resource involved in processing is filling to capacity. Try reducing the data ingestion volume and see if that completes, then increase the data ingested incrementally.I assume you have run the job on small amount of data so you have completed your prototype stage successfully. On Saturday, 11 April 2020 Mich Talebzadeh wrote: Hi, Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging? set pagesize set linesize 140 set heading off select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD HH:MI AM') from v$database; set heading on column spid heading "OS PID" format a6 column process format a13 heading "Client ProcID" column username format a15 column sid format 999 column serial# format 9 column STATUS format a3 HEADING 'ACT' column last format 9,999.99 column TotGets format 999,999,999,999 HEADING 'Logical I/O' column phyRds format 999,999,999 HEADING 'Physical I/O' column total_memory format 999,999,999 HEADING 'MEM/KB' -- SELECT substr(a.username,1,15) "LOGIN" , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#" , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE" , substr(a.machine,1,10) HOST , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID" , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID" , substr(a.program,1,15) PROGRAM --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours" , ( select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn where ss.sid = a.sid and sn.statistic# = ss.statistic# and -- sn.name in ('session pga memory') sn.name in ('session pga memory','session uga memory') ) AS total_memory , (b.block_gets + b.consistent_gets) TotGets , b.physical_reads phyRds , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO" FROM v$process p ,v$session a ,v$sess_io b WHERE a.paddr = p.addr AND p.background IS NULL --AND a.sid NOT IN (select sid from v$mystat where rownum = 1) AND a.sid = b.sid AND a.username is not null --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') --AND CURRENT_DATE - logon_time > 0 --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) -- exclude me --AND (b.block_gets + b.consistent_gets) > 0 ORDER BY a.username; exit HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destructionof data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.The author will in no case be liable for any monetary damages arising from suchloss, damage or destruction. On Fri, 10 Apr 2020 at 17:37, Ruijing Li wrote: Hi all, I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I
Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting
This tool may be useful for you to trouble shoot your problems away. https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html "APM tools typically use a waterfall-type view to show the blocking time of different components cascading through the control flow within an application. These types of visualizations are useful, and AppOptics has them, but they can be difficult to understand for those of us without a PhD." Especially helpful if you want to understand through visualisation and you do not have a phD. Jane thorpe janethor...@aol.com -Original Message----- From: jane thorpe To: mich.talebzadeh ; liruijing09 ; user CC: user Sent: Sun, 12 Apr 2020 4:35 Subject: Re: Spark hangs while reading from jdbc - does nothing You seem to be implying the error is intermittent. You seem to be implying data is being ingested via JDBC. So the connection has proven itself to be working unless no data is arriving from the JDBC channel at all. If no data is arriving then one could say it could be the JDBC.If the error is intermittent then it is likely a resource involved in processing is filling to capacity. Try reducing the data ingestion volume and see if that completes, then increase the data ingested incrementally.I assume you have run the job on small amount of data so you have completed your prototype stage successfully. On Saturday, 11 April 2020 Mich Talebzadeh wrote: Hi, Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging? set pagesize set linesize 140 set heading off select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD HH:MI AM') from v$database; set heading on column spid heading "OS PID" format a6 column process format a13 heading "Client ProcID" column username format a15 column sid format 999 column serial# format 9 column STATUS format a3 HEADING 'ACT' column last format 9,999.99 column TotGets format 999,999,999,999 HEADING 'Logical I/O' column phyRds format 999,999,999 HEADING 'Physical I/O' column total_memory format 999,999,999 HEADING 'MEM/KB' -- SELECT substr(a.username,1,15) "LOGIN" , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#" , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE" , substr(a.machine,1,10) HOST , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID" , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID" , substr(a.program,1,15) PROGRAM --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours" , ( select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn where ss.sid = a.sid and sn.statistic# = ss.statistic# and -- sn.name in ('session pga memory') sn.name in ('session pga memory','session uga memory') ) AS total_memory , (b.block_gets + b.consistent_gets) TotGets , b.physical_reads phyRds , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO" FROM v$process p ,v$session a ,v$sess_io b WHERE a.paddr = p.addr AND p.background IS NULL --AND a.sid NOT IN (select sid from v$mystat where rownum = 1) AND a.sid = b.sid AND a.username is not null --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') --AND CURRENT_DATE - logon_time > 0 --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) -- exclude me --AND (b.block_gets + b.consistent_gets) > 0 ORDER BY a.username; exit HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destructionof data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.The author will in no case be liable for any monetary damages arising from suchloss, damage or destruction. On Fri, 10 Apr 2020 at 17:37, Ruijing Li wrote: Hi all, I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.-- Cheers,Ruijing Li
Re: covid 19 Data [DISCUSSION]
Thank you Sir, I am currently developing a small OLTP web application using Spring Framework.Although Spring Framework is open source it is actually a professional product which comes a professional code generator at https://start.spring.io/.The code generator is flawless and professional like yourself. I am using the following two Java Libraries to ingest (fetch) data across the Wide Area Network for processing.These Java libraries only became available recently ( jdk12). import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; // declare temp store to prevent errors by calling only after population process complete. List newStats = new ArrayList<>(); // create a new Http client new features in JDK 12+ HttpClient client = HttpClient.newHttpClient(); // create request with the URL using builder pattern HttpRequest request = HttpRequest.newBuilder() .uri(URI.create(VIRUS_DATA_URL)) .build(); // send request and body of the response as a String HttpResponse httpResponse = client.send(request,HttpResponse.BodyHandlers.ofString()); // System.out.println(httpResponse.body()); I am also using Java Libraries http://commons.apache.org/proper/commons-csv/user-guide.html to process the raw data. ready for display in browser. // read whole csv file StringReader csvBodyReader = new StringReader(httpResponse.body()); // populate array with each row marking first row as table header Iterable records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(csvBodyReader); for (CSVRecord record : records) { LocationStats locationStat = new LocationStats(); locationStat.setState(record.get("Province/State")); locationStat.setCountry(record.get("Country/Region")); int latestCases = Integer.parseInt(record.get(record.size() - 1)); locationStat.setLatestTotalCases(latestCases); newStats.add(locationStat); System.out.println(locationStat); Thank you once again sir for clarifying WEKA and its scope of use case. jane thorpe janethor...@aol.com -Original Message- From: Teemu Heikkilä To: jane thorpe CC: user Sent: Sun, 12 Apr 2020 22:33 Subject: Re: covid 19 Data [DISCUSSION] Hi Jane! The data you pointed there is couple tens of MBs, I wouldn’t exacly say it’s "big data” and definitely you don’t need to use Apache Spark for processing that amount of data. I would suggest you using some other tools for your processing needs. WEKA is ”full suite” for data analysis and visualisation and it’s probably good choice for the task. If you want to go lower level like with Spark and you are familiar with Python, pandas could be good library to investigate. br,Teemu Heikkilä te...@emblica.com +358 40 0963509 Emblica ı The data engineering company Kaisaniemenkatu 1 B 00100 Helsinki https://emblica.com jane thorpe kirjoitti 12.4.2020 kello 22.30: Hi, Three weeks a phD guy proposed to start a project to use Apache Spark to help the WHO with predictive analysis using COVID -19 data. I have located the daily updated data. It can be found here https://github.com/CSSEGISandData/COVID-19. I was wondering if Apache Spark is up to the job of handling BIG DATA of this sizeor would it be better to use WEKA. Please discuss which product is more suitable ? Jane janethor...@aol.com
covid 19 Data [DISCUSSION]
Hi, Three weeks a phD guy proposed to start a project to use Apache Spark to help the WHO with predictive analysis using COVID -19 data. I have located the daily updated data. It can be found here https://github.com/CSSEGISandData/COVID-19. I was wondering if Apache Spark is up to the job of handling BIG DATA of this sizeor would it be better to use WEKA. Please discuss which product is more suitable ? Jane janethor...@aol.com
COVID 19 data
hi, A phD guy proposed to start a project for the WHO accumulated jane thorpe janethor...@aol.com
Re: Spark hangs while reading from jdbc - does nothing
You seem to be implying the error is intermittent. You seem to be implying data is being ingested via JDBC. So the connection has proven itself to be working unless no data is arriving from the JDBC channel at all. If no data is arriving then one could say it could be the JDBC. If the error is intermittent then it is likely a resource involved in processing is filling to capacity. Try reducing the data ingestion volume and see if that completes, then increase the data ingested incrementally. I assume you have run the job on small amount of data so you have completed your prototype stage successfully. On Saturday, 11 April 2020 Mich Talebzadeh wrote: Hi, Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging? set pagesize set linesize 140 set heading off select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD HH:MI AM') from v$database; set heading on column spid heading "OS PID" format a6 column process format a13 heading "Client ProcID" column username format a15 column sid format 999 column serial# format 9 column STATUS format a3 HEADING 'ACT' column last format 9,999.99 column TotGets format 999,999,999,999 HEADING 'Logical I/O' column phyRds format 999,999,999 HEADING 'Physical I/O' column total_memory format 999,999,999 HEADING 'MEM/KB' -- SELECT substr(a.username,1,15) "LOGIN" , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#" , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE" , substr(a.machine,1,10) HOST , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID" , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID" , substr(a.program,1,15) PROGRAM --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours" , ( select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn where ss.sid = a.sid and sn.statistic# = ss.statistic# and -- sn.name in ('session pga memory') sn.name in ('session pga memory','session uga memory') ) AS total_memory , (b.block_gets + b.consistent_gets) TotGets , b.physical_reads phyRds , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO" FROM v$process p ,v$session a ,v$sess_io b WHERE a.paddr = p.addr AND p.background IS NULL --AND a.sid NOT IN (select sid from v$mystat where rownum = 1) AND a.sid = b.sid AND a.username is not null --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') --AND CURRENT_DATE - logon_time > 0 --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) -- exclude me --AND (b.block_gets + b.consistent_gets) > 0 ORDER BY a.username; exit HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destructionof data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.The author will in no case be liable for any monetary damages arising from suchloss, damage or destruction. On Fri, 10 Apr 2020 at 17:37, Ruijing Li wrote: Hi all, I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.-- Cheers,Ruijing Li
Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt
Hi Som, HdfsWordCount program counts words >From files you place in a directory with the name of argv [args.length -1] >while the program is running in a for (;;) loop until user press CTRL C. Why does program name have prefix of HDFS ? HADOOP distributed FileSystem. Is it a program which demonstrates HDFS or streaming. I am really really confused with this program ExceptionHandlingTest. What exception handling is being tested, JVM's throw new exception syntax , if value greater than 0.75, or is it some thing meant to be testing SPARK API exception handling. spark.sparkContext.parallelize(0 until spark.sparkContext.defaultParallelism).foreach { i => if (math.random > 0.75) { throw new Exception("Testing exception handling") } } package org.apache.spark.examples import org.apache.spark.sql.SparkSession object ExceptionHandlingTest { def main(args: Array[String]): Unit = { val spark = SparkSession .builder .appName("ExceptionHandlingTest") .getOrCreate() spark.sparkContext.parallelize(0 until spark.sparkContext.defaultParallelism).foreach { i => if (math.random > 0.75) { throw new Exception("Testing exception handling") } } spark.stop() }} On Monday, 6 April 2020 Som Lima wrote: Ok Try this one instead. (link below) It has both an EXIT which we know is rude and abusive instead of graceful structured programming and also includes half hearted user input validation. Do you think millions of spark users download and test these programmes and repeat this rude programming behaviour. I don't think they have any coding rules like the safety critical software industry But they do have strict emailing rules. Do you think email rules are far more important than programming rules and guidelines ? https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/clickstream/PageViewStream.scala On Mon, 6 Apr 2020, 07:04 jane thorpe, wrote: Hi Som , Did you know that simple demo program of reading characters from file didn't work ? Who wrote that simple hello world type little program ? jane thorpe janethor...@aol.com -Original Message- From: jane thorpe To: somplasticllc ; user Sent: Fri, 3 Apr 2020 2:44 Subject: Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt Thanks darling I tried this and worked hdfs getconf -confKey fs.defaultFS hdfs://localhost:9000 scala> :paste // Entering paste mode (ctrl-D to finish) val textFile = sc.textFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README7.out") // Exiting paste mode, now interpreting. textFile: org.apache.spark.rdd.RDD[String] = hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at textFile at :27 counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at reduceByKey at :30 scala> :quit jane thorpe janethor...@aol.com -Original Message- From: Som Lima CC: user Sent: Tue, 31 Mar 2020 23:06 Subject: Re: HDFS file Hi Jane Try this example https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala Som On Tue, 31 Mar 2020, 21:34 jane thorpe, wrote: hi, Are there setup instructions on the website for spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") val textFile = sc.textFile("/data/README.md")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("/data/wordcount") textFile: org.apache.spark.rdd.RDD[String] = /data/README.md MapPartitionsRDD[23] at textFile at :28counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at reduceByKey at :31 br Jane
Fwd: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt
Hi Som , Did you know that simple demo program of reading characters from file didn't work ? Who wrote that simple hello world type little program ? jane thorpe janethor...@aol.com -Original Message- From: jane thorpe To: somplasticllc ; user Sent: Fri, 3 Apr 2020 2:44 Subject: Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt Thanks darling I tried this and worked hdfs getconf -confKey fs.defaultFS hdfs://localhost:9000 scala> :paste // Entering paste mode (ctrl-D to finish) val textFile = sc.textFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README7.out") // Exiting paste mode, now interpreting. textFile: org.apache.spark.rdd.RDD[String] = hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at textFile at :27 counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at reduceByKey at :30 scala> :quit jane thorpe janethor...@aol.com -Original Message- From: Som Lima CC: user Sent: Tue, 31 Mar 2020 23:06 Subject: Re: HDFS file Hi Jane Try this example https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala Som On Tue, 31 Mar 2020, 21:34 jane thorpe, wrote: hi, Are there setup instructions on the website for spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") val textFile = sc.textFile("/data/README.md")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("/data/wordcount") textFile: org.apache.spark.rdd.RDD[String] = /data/README.md MapPartitionsRDD[23] at textFile at :28counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at reduceByKey at :31 br Jane
(float(9)/5)*x + 32) when x = 12.8
PLATFORM zeppelin 0.9 SPARK_HOME = spark-3.0.0-preview2-bin-hadoop2.7 %spark.ipyspark # work around sc.setJobGroup("a","b") tempc = sc.parallelize([12.8]) tempf = tempc.map(lambda x: (float(9)/5)*x + 32) tempf.collect() OUTPUT [55.046] %spark.ipyspark # work around sc.setJobGroup("a","b") tempc = sc.parallelize([38.4,19.2,13.8,9.6]) tempf = tempc.map(lambda x: (float(9)/5)*x + 32) tempf.collect() OUTPUT : [101.12, 66.56, 56.84, 49.28] calculator result = 55.04 Is the answer correct when x = 12.8 ? jane thorpe janethor...@aol.com
Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt
Thanks darling I tried this and worked hdfs getconf -confKey fs.defaultFS hdfs://localhost:9000 scala> :paste // Entering paste mode (ctrl-D to finish) val textFile = sc.textFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README7.out") // Exiting paste mode, now interpreting. textFile: org.apache.spark.rdd.RDD[String] = hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at textFile at :27 counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at reduceByKey at :30 scala> :quit jane thorpe janethor...@aol.com -Original Message- From: Som Lima CC: user Sent: Tue, 31 Mar 2020 23:06 Subject: Re: HDFS file Hi Jane Try this example https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala Som On Tue, 31 Mar 2020, 21:34 jane thorpe, wrote: hi, Are there setup instructions on the website for spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") val textFile = sc.textFile("/data/README.md")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("/data/wordcount") textFile: org.apache.spark.rdd.RDD[String] = /data/README.md MapPartitionsRDD[23] at textFile at :28counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at reduceByKey at :31 br Jane
HDFS file
hi, Are there setup instructions on the website for spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") val textFile = sc.textFile("/data/README.md")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("/data/wordcount") textFile: org.apache.spark.rdd.RDD[String] = /data/README.md MapPartitionsRDD[23] at textFile at :28counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at reduceByKey at :31 br Jane