Greetings everyone,
I want to share with the community some work I've been doing on
implementing proper support for the HEJI notation system in LilyPond. I
wouldn't call it stable quite yet, as I'm still open to making breaking
changes to the interface based on feedback. However, I would say it
e changes and generating a new configure script,
the output changes depending on RATIONAL_NUMBERS_WANTED, as expected.
Best regards,
Gylfi
Hi.
Your code is like this right?
"/joined_dataset = show_channel.join(show_views) joined_dataset.take(4)/"
well /joined_dataset / is now an array (because you used /.take(4)/ ).
So it does not support any RDD operations..
Could that be the problem?
Otherwise more code is needed to
1) Start by looking at ML-lib or KeystoneML
2) If you can't find an impl., start by analyzing the access patterns and
data manipulations you will need to implement.
3) Then figure out if it fits Spark structures.. and when you realized it
doesn't you start speculating on how you can twist or
Hi.
Can't you do a filter, to get only the ABC shows, map that into a keyed
instance of the show,
and then do a reduceByKey to sum up the views?
Something like this in Scala code: /filter for the channel new pair
(show, view count) /
val myAnswer = joined_dataset.filter( _._2._1 == "ABC"
Look at KeystoneML, there is an image processing pipeline there
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/partition-RDD-of-images-tp25515p25518.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Can't you just access it by element, like with [0] and [1] ?
http://www.tutorialspoint.com/python/python_tuples.htm
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-work-with-a-joined-rdd-in-pyspark-tp25510p25517.html
Sent from the Apache Spark
"spark.storage.memoryFraction 0.05"
If you want to store a lot of memory I think this must be a higher fraction.
The default is 0.6 (not 0.0X).
To change the output directory you can set "spark.local.dir=/path/to/dir"
and you can even specify multiple directories (for example if you have
HDFS has a default replication factor of 3
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-a-3-8-T-dataset-take-up-11-59-Tb-on-HDFS-tp25471p25497.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
ckage.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
... 13 more"
I have already set set the akka.timeout to 300 etc.
Anyone have any ideas on what the problem could be ?
Regares,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1
Hi.
You may want to look into Indexed RDDs
https://github.com/amplab/spark-indexedrdd
Regards,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-lookup-by-a-key-in-an-RDD-tp25243p25247.html
Sent from the Apache Spark User List
Hi.
What is slow exactly?
In code-base 1:
When you run the persist() + count() you stored the result in RAM.
Then the map + reducebykey is done on in-memory data.
In the latter case (all-in-oneline) you are doing both steps at the same
time.
So you are saying that if you sum-up the time to
By default Spark will actually not keep the data at all, it will just store
"how" to recreate the data.
The programmer can however choose to keep the data once instantiated by
calling "/.persist()/" or "/.cache()/" on the RDD.
/.cache/ will store the data in-memory only and fail if it will not
on how much RAM you have per node, you may want to re-block the
data on HDFS for optimal performance.
Hope this helps,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-the-length-of-each-task-varies-tp24008p24014.html
Sent from the Apache Spark
You may want to look into using the pipe command ..
http://blog.madhukaraphatak.com/pipe-in-spark/
http://spark.apache.org/docs/0.6.0/api/core/spark/rdd/PipedRDD.html
--
View this message in context:
structure as it is not synced between workers after it is broadcasted.
To broadcast, your data must be serializable.
If the data you are trying to broadcast is a distributed RDD (and thus I
assumably large), perhaps what you need is some form of join operation (or
cogroup)?
Regards,
Gylfi
Hi.
Assuming your have the data in an RDD you can save your RDD (regardless of
structure) with nameRDD.saveAsObjectFile(path) where path can be
hdfs:///myfolderonHDFS or the local file system.
Alternatively you can also use .saveAsTextFile()
Regards,
Gylfi.
--
View this message
into more parts before line 52 by calling
rddname.repartition(10) for example and see if it runs faster..
Regards,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-same-execution-time-on-1-node-and-5-nodes-tp23866p23893.html
Sent from the Apache
could would be something like this ..
val flattnedIntRDD : RDD[(Int)] = intArraysRDD.flatmap( array =
array.toList)
However, to understand exactly your problem you need to explain better what
the RDD you want to create should look like..
Regards,
Gylfi.
--
View this message in context
both RDDs are destroyed again.
If you run the myrdd2.count again, both myrdd and myrdd2 are created again
..
If your transformation is expensive, you may want to keep the data around
and for that must use .persist() or .cache() etc.
Regards,
Gylfi.
--
View this message in context:
http
You could even try changing the block size of the input data on HDFS (can be
done on a per file basis) and that would get all workers going right from
the get-go in Spark.
--
View this message in context:
? Does this make any sense? :)
Regards,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/K-Nearest-Neighbours-tp23759p23899.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
;)
Regards and good luck,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster-tp23650p23704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi.
I am just wondering if the rdd was actually modified.
Did you test it by printing rdd.partitions.length before and after?
Regards,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-we-control-output-part-files-created-by-Spark-job
actually correct?
Hope this helps..
Regards,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-Kryo-Serializer-is-slower-than-Java-Serializer-in-TeraSort-tp23621p23659.html
Sent from the Apache Spark User List mailing list archive
Hi.
Have you tried to repartition the finalRDD before saving?
This link might help.
http://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter3/save_the_rdd_to_files.html
Regards,
Gylfi.
--
View this message in context:
http://apache-spark-user
spark.speculation.multiplier
spark.speculation.quantile
See https://spark.apache.org/docs/latest/configuration.html under
Scheduling.
Regards,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster
future? If not, can
somebody give me an idea of what would be a reasonable way to roll my own?
BTW, PdfBox has no problems in terms of text extraction using Type3 fonts,
at least not in the testing that I've done so far, but I prefer using
iTextSharp for this if I can get it to work.
Thanks,
Gylfi
) {
currentPdfReaderInstance = GetPdfReaderInstance(reader);
}
return currentPdfReaderInstance.GetNewObjectNumber(number, generation);
}
Gylfi
-Original Message-
From: Gylfi Ingvason [mailto:gylfi.ingva...@solimarsystems.com]
Sent: Thursday, August 05, 2010 3:59 PM
To: 'Post all
that there is a better way to fix the problem although
we are getting by at the moment.
Any feedback would be greatly appreciated.
Thanks - Gylfi
-Original Message-
From: Paulo Soares [mailto:psoa...@glintt.com]
Sent: Wednesday, August 04, 2010 12:36 PM
To: Post all your questions about iText
to patch the latest?
Any feedback would be appreciated.
Thanks - Gylfi
--
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million
Use IronPython and iTextSharp. No wrappers needed.
-Original Message-
From: 1T3XT info [mailto:i...@1t3xt.info]
Sent: Thursday, April 22, 2010 12:04 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] Python 2.5 wrapper for iTextSharp
Ignacio.ruizdeconejo
You might want to try to use PdfSmartCopy instead of PdfWriter - it caches
stream objects.
-Original Message-
From: Fred Cohen [mailto:f...@all.net]
Sent: Thursday, March 25, 2010 9:21 AM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] Producing images without
Don't know about the Java version of iText, but last time I checked,
iTextSharp did not support generating PDF files greater than 2 GB and my
impression from Paulo was that this was deliberate and that adding that
support was not being planned.
_
From: Leonard Rosenthol
and I would highly recommend it as your data is going to be the best
indicator on what will happen.
Best of luck,
Gylfi
_
From: Jason Berk [mailto:jb...@purdueefcu.com]
Sent: Monday, March 22, 2010 2:46 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions
the community well in the long run.
Thanks - Gylfi
--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance
to do this on a template being added.
Any ideas on how this can be accomplished?
Thanks - Gylfi
--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app
.
Leonard
-Original Message-
From: Gylfi Ingvason [mailto:gylfi.ingva...@solimarsystems.com]
Sent: Wednesday, December 16, 2009 3:40 PM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] Adding a /Name to an XObject Form containing an
imported page
Gents,
I use PdfWriter
what it takes.
Gylfi
-Original Message-
From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Saturday, November 21, 2009 5:28 PM
To: gylfi.ingva...@solimarsystems.com; Post all your questions about iText
here
Subject: RE: [iText-questions] AcroForm with empty Fields array
Awesome! Thanks a lot Bruno - you guys are the best.
Gylfi
-Original Message-
From: 1T3XT info [mailto:i...@1t3xt.info]
Sent: Monday, November 23, 2009 10:06 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] AcroForm with empty Fields array
Gylfi Ingvason
- it
is just a file with a damaged Xref - no big deal. Same thing in Acrobat -
you get a Do you want to save your changes? dialog when you close the
viewer, but otherwise there is no hint that the file is garbage.
Anyway, thanks again for taking the time to look at file and respond.
Best regards,
Gylfi
to patch the iText code such that form flattening
takes place on Annots not referenced in the Fields array of the form?
Best regards,
Gylfi
FormFile.pdf
Description: Adobe PDF document
--
Let Crystal Reports handle
big brain fart on behalf of Adobe and that it should
be fixed on that end.
Thanks - Gylfi
-Original Message-
From: Paulo Soares [mailto:psoa...@glintt.com]
Sent: Monday, October 05, 2009 6:24 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] Color shift
displays correct colors.
You used iText to modify it, and now it doesn't. That person is not going
to understand the logic that the color was incorrectly defined from the
beginning.
Thanks - Gylfi
-Original Message-
From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Tuesday
Paulo,
Enclosed are two small files that demonstrate the Transparency Group
problem. The BeforeImport.pdf shows an image in a light blue color, and the
AfterImport.pdf is the same file after being imported using PdfWriter. The
color shift should be clearly noticeable in Acrobat.
Thanks - Gylfi
an RGB page
group and an CMYK page group and both pages are included in a single page,
what should be the resulting page group?
Paulo
- Original Message -
From: Gylfi Ingvason gylfi.ingva...@solimarsystems.com
To: 'Post all your questions about iText here'
itext-questions
sure the imported page is kept intact. The PdfCopy
works fine (and PdfStamper presumably as well), but in that case the
imported page object is preserved as-is as opposed to being converted into
an XObject Form.
Thanks - Gylfi
/StructParents 0/Contents 73 0 R/Rotate
0/Group/CS/DeviceRGB/S/Transparency/Type/Group/MediaBox[0 0 612
792]/Resources/Font/F1 70 0 R/F2 74 0
R/ProcSet[/PDF/Text/ImageB/ImageC/ImageI]/Type/Page
endobj
Gylfi
-Original Message-
From: Paulo Soares [mailto:psoa...@glintt.com]
Sent: Monday, September
and missing Transparency Group
I need a PDF to work on.
Paulo
- Original Message -
From: Gylfi Ingvason gylfi.ingva...@solimarsystems.com
To: 'Post all your questions about iText here'
itext-questions@lists.sourceforge.net
Sent: Monday, September 21, 2009 7:53 PM
Subject: Re: [iText-questions
49 matches
Mail list logo