Hi,
I'm trying to install spark 1 on my hadoop cluster running on EMR. I
didn't have any problem installing the previous versions, but on this
version I couldn't find any 'sbt' folder. However, the README still
suggests using this to install Spark:
./sbt/sbt assembly
which fails:
./sbt/sbt:
Hi,
I've got the following code http://pastebin.com/3kexKwg6 that's almost
complete, but I have 2 questions:
1) Once I've computed the TF-IDF vector, how do I compute the vector for
each string to feed into the LabeledPoint?
2) Does MLLib provide any methods to evaluate the model's precision,
Thanks Suresh, that worked like a charm!
I created the /user/hive/warehouse directory and chmod'd to 777.
regards,
imran
On Wed, Feb 24, 2016 at 2:48 PM, Suresh Thalamati <
suresh.thalam...@gmail.com> wrote:
> Try creating /user/hive/warehouse/ directory if it does not exists , and
> check it
thanks Michael,
I'm trying to implement the code in pyspark like so (where my dataframe has
3 columns - customer_id, dt, and product):
st = StructType().add("dt", DateType(), True).add("product", StringType(),
True)
top = data.select("customer_id", st.alias('vs'))
.groupBy("customer_id")
I have a use case similar to this:
http://stackoverflow.com/questions/33878370/spark-dataframe-select-the-first-row-of-each-group
and I'm trying to understand the solution titled "ordering over structs":
1) Is a struct in Spark like a struct in C++?
2) What is an alias in this context?
3) How
.struct>
> (which creates an actual struct), you are trying to use the struct datatype
> (which just represents the schema of a struct).
>
> On Thu, Apr 7, 2016 at 3:48 PM, Imran Akbar <skunkw...@gmail.com> wrote:
>
>> thanks Michael,
>>
>>
>> I'm tr
Hi,
I'm reading in a CSV file, and I would like to write it back as a permanent
table, but with particular partitioning by year, etc.
Currently I do this:
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
df =
Hi,
I'm reading in a CSV file, and I would like to write it back as a permanent
table, but with partitioning by year, etc.
Currently I do this:
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
df =
sqlContext.read.format('com.databricks.spark.csv').options(header='true',
Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On
I have some Python code that consistently ends up in this state:
ERROR:py4j.java_gateway:An error occurred while trying to connect to the
Java server
Traceback (most recent call last):
File
"/home/ubuntu/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line
690, in start
I'm trying to save a table using this code in pyspark with 1.6.1:
prices = sqlContext.sql("SELECT AVG(amount) AS mean_price, country FROM src
GROUP BY country")
prices.collect()
prices.write.saveAsTable('prices', format='parquet', mode='overwrite',
path='/mnt/bigdisk/tables')
but I'm getting
11 matches
Mail list logo